Week 10 - Feature selection
Feature selection is the process of deciding what features of the data are worth training on. Then picking the subset of them that are the most important. The reason to do this is:
- Knowledge discovery, interpretability, and insight.
- When you get the results back you are going to have to be able to explain them.
- The curse of dimensionality
- If you put too many dimensions in - it will decrease the accuracy of your models!
How hard is this problem?
Assume we want to know the best
This problem is know to be NP-hard.
Techniques
There are 2 main families of techniques to solve this problem
- Filtering, a process of deciding which features to keep before passing it to the learning algorithm, and
- Wrapping, using the learning algorithm to decide which features to keep.
Filtering has the following pay offs
- It is faster normally as it doesn’t need to retrain models,
- It is invariant of the learning problem making it more generic but also receiving no feedback from it, and
- The speed can come from looking at features in isolation that may not provide the full picture.
Wrapping has the following pay offs
- It takes into account the models bias and can account for it, and
- It can be very slow depending on the algorithm.
We have already come across methods of doing filtering
- Information entropy,
- Decision tree does feature selection already, and
- Neural network internally does features selection.
We have already come across some ways to do wrapping too
- Hill climbing,
- Random optimisation more generally,
- Gradient decent,
- Forward search where we keep adding features until we don’t get any better, and
- Backward search where we keep removing features until the score starts to drop.
Relevance
Strongly relevant feature
Link to originalStrongly relevant feature
For a learning problem a feature
of the input space is strongly relevant if by removing it it degrades the Bayeses optimal classifier.
Weakly relevant feature
Link to originalWeakly relevant feature
For a learning problem a feature
of the input space is weakly relevant if
- it is not a Strongly relevant feature, and
- there exists a subset of features
such that adding to improves the Bayeses optimal classifier.
Irrelevant feature
Link to originalIrrelevant feature
For a learning problem a feature
of the input space is irrelevant it is not a Strongly relevant feature or Weakly relevant feature.
Whilst relevance is very abstract, you can have irrelevant features that help a particular algorithm. These features are useful.
Useful feature
Link to originalUseful feature
For a learning problem, learning algorithm and measure of error a feature
of the input space is useful if including it you can decrease the error.