Week 10 - Feature transformation

Feature transformation is the problem of pre-processing a set of features to create a new (smaller/compact) feature set, while retaining as much information as possible. It is a map where you usually want .

In this course we will focus on linear  feature reduction where is a linear map.

Note

Feature selection is a special case of feature transformation

Problems to overcome

If you think of features in analogy to language there is two problems when using a feature to label data.

Polysemy

Polysemy

One word having multiple meanings in different context.

Link to original

Synonymy

Synonymy

Multiple words having the same meaning in a particular context.

Link to original

Principle component analysis

Principle component analysis

Principle component analysis

Principle component analysis is a linear dimension reduction algorithm. In concept principle component analysis find the axis along which the data has maximal variance if it were projected. It does this by finding the Eigenvector and Eigenvalues of the Covariance matrix. It uses these eigenvectors as a new basis of the data’s feature space.

It performs dimension reduction by only picking the eigenvectors which have the highest eigenvalue.

Link to original

Independent component analysis

Independent component analysis

Independent component analysis

Independent component analysis is a form of linear dimension reduction. The goal of independent component analysis is to form a linear map to features which are independent of one another.

Strictly if you previous features were and you map to then we want the following statements about Mutual information:

  • for all , and
  • To maximise .

This can be used to solve the Cocktail party problem.

Link to original

Cocktail party problem

Statement

Cocktail party problem

Suppose you have people talking at the same time with microphones placed around the room. From the recordings of the microphones, when and how can you recover the sounds?

Solutions

Link to original

Comparison of ICA and PCA

These both do different things. Notice if we have a set of i.i.d. random variables from the Central limit theorem if they set is large enough their sum will look normally distributed which will provide an axis that maximises variance. Therefore PCA might cut a through a line of their addition whereas ICA will want to separate them.

Whilst ICA solves the Cocktail party problem very well, PCA is very poor at it. PCA’s goal is to find the most shared features whereas ICA finds the features that splits the data apart. For example on faces, ICA finds noses, eyes, chins whereas PCA find brightness or the average face first.

We can use ICA to understand our data on what separates points the best however ICA is not the most efficient algorithm. Though the understanding of your data it provides you can then use to implement more efficient algorithms. For example on documents ICA picks out topics or on nature pictures ICA picks out edges. Both of which there are better algorithms to find.

Alternatives

Random component analysis

Random component analysis

Random component analysis is a form of linear dimension reduction. It picks a random linear map and applies that.

This works mainly due to The curse of dimensionality.

Link to original

Transclude of Linear-discriminant-analysis