Week 10 - Feature selection

Feature selection is the process of deciding what features of the data are worth training on. Then picking the subset of them that are the most important. The reason to do this is:

Knowledge discovery, interpretability, and insight.
- When you get the results back you are going to have to be able to explain them.
The curse of dimensionality
- If you put too many dimensions in - it will decrease the accuracy of your models!

How hard is this problem?

Assume we want to know the best subset of columns. The only way to verify it is the best is to check them all so it is . If you don’t know then it is .

This problem is know to be NP-hard.

Techniques

There are 2 main families of techniques to solve this problem

Filtering, a process of deciding which features to keep before passing it to the learning algorithm, and
Wrapping, using the learning algorithm to decide which features to keep.

Filtering has the following pay offs

It is faster normally as it doesn’t need to retrain models,
It is invariant of the learning problem making it more generic but also receiving no feedback from it, and
The speed can come from looking at features in isolation that may not provide the full picture.

Wrapping has the following pay offs

It takes into account the models bias and can account for it, and
It can be very slow depending on the algorithm.

We have already come across methods of doing filtering

Information entropy,
Decision tree does feature selection already, and
Neural network internally does features selection.

We have already come across some ways to do wrapping too

Hill climbing,
Random optimisation more generally,
Gradient decent,
Forward search where we keep adding features until we don’t get any better, and
Backward search where we keep removing features until the score starts to drop.

Relevance

Strongly relevant feature

Strongly relevant feature

For a learning problem a feature of the input space is strongly relevant if by removing it it degrades the Bayeses optimal classifier.

Link to original

Weakly relevant feature

Weakly relevant feature

For a learning problem a feature of the input space is weakly relevant if

it is not a Strongly relevant feature, and

there exists a subset of features such that adding to improves the Bayeses optimal classifier.

Link to original

Irrelevant feature

Irrelevant feature

For a learning problem a feature of the input space is irrelevant it is not a Strongly relevant feature or Weakly relevant feature.

Link to original

Whilst relevance is very abstract, you can have irrelevant features that help a particular algorithm. These features are useful.

Useful feature

Useful feature

For a learning problem, learning algorithm and measure of error a feature of the input space is useful if including it you can decrease the error.

Link to original

Alex's Notes

Explorer

Week 10 - Feature selection

Week 10 - Feature selection

How hard is this problem?

Techniques

Relevance

Strongly relevant feature

Weakly relevant feature

Irrelevant feature

Useful feature

Graph View

Table of Contents

Backlinks