Week 10 - Feature selection

Feature selection is the process of deciding what features of the data are worth training on. Then picking the subset of them that are the most important. The reason to do this is:

  • Knowledge discovery, interpretability, and insight.
    • When you get the results back you are going to have to be able to explain them.
  • The curse of dimensionality
    • If you put too many dimensions in - it will decrease the accuracy of your models!

How hard is this problem?

Assume we want to know the best subset of columns. The only way to verify it is the best is to check them all so it is . If you don’t know then it is .

This problem is know to be NP-hard.

Techniques

There are 2 main families of techniques to solve this problem

  • Filtering, a process of deciding which features to keep before passing it to the learning algorithm, and
  • Wrapping, using the learning algorithm to decide which features to keep.

Filtering has the following pay offs

  • It is faster normally as it doesn’t need to retrain models,
  • It is invariant of the learning problem making it more generic but also receiving no feedback from it, and
  • The speed can come from looking at features in isolation that may not provide the full picture.

Wrapping has the following pay offs

  • It takes into account the models bias and can account for it, and
  • It can be very slow depending on the algorithm.

We have already come across methods of doing filtering

We have already come across some ways to do wrapping too

Relevance

Strongly relevant feature

Strongly relevant feature

For a learning problem a feature of the input space is strongly relevant if by removing it it degrades the Bayeses optimal classifier.

Link to original

Weakly relevant feature

Weakly relevant feature

For a learning problem a feature of the input space is weakly relevant if

Link to original

Irrelevant feature

Irrelevant feature

For a learning problem a feature of the input space is irrelevant it is not a Strongly relevant feature or Weakly relevant feature.

Link to original

Whilst relevance is very abstract, you can have irrelevant features that help a particular algorithm. These features are useful.

Useful feature

Useful feature

For a learning problem, learning algorithm and measure of error a feature of the input space is useful if including it you can decrease the error.

Link to original