Non-linear methods

Transformation of the input space.

  • polynomial regression: Use polynomial features,
  • Step functions: Cut region up and take mean,
  • Regression splines: Cut region up and fit polynomials with smooth intersections,
    • What is the difference between this and Boosting?
  • Smoothing splines: Similar to regression,
  • Local regression: Similar to splines but with overlapping boundaries,
  • Generalised Additive Models (GAM): Combining all the above,

Polynomial regression

Polynomial regression

Polynomial regression

In the modelling framework saying we are doing polynomial regression is saying that we are picking the modelling paradigm of functions of the form where

  • ,
  • , and

we say is the degree of our polynomial.

If was 1-dimensional this would be:

Link to original

Here we will focus on .

This has issues with error explosion outside a certain region.

Step functions

Step function methods

Step function methods

Suppose we have a function a step function method uses step function to model this. Here we choose intervals of the domain to define different functions on each region.

Link to original

One of the simplest methods in step function methods is to choose meaningful intervals in your domain. Then setting to be the mean of test data in this region. This is likely not applied much we are using this as an idea to be used with other methods.

To apply this we use a one hot embedding.

We should only use this if there are natural breakpoints where we can input domain knowledge.

Piece-wise polynomials

Here instead of using constant values we switch this to a polynomial.

Basis function

These form a basis of your functional space.

Regression splines

A degree D spline is a piecewise polynomial with degree with continuity in derivatives up to degree .

Cublic spline

A cubic spline is given by

where denotes the cut points for and

This donates a basis of the cubic splines.

This generalises to power splines being of the same form.

Degrees of freedom

There will be 4 + K degrees of freedom where is the number of breakpoints.

In comparison piecewise polynomial regression with breakpoints we have degrees of freedom.

Natural cubic spline

A natural cubic spline has linear components before and after the first and last spline.

Choosing knots

  • Place more knots where your function varies more rapidly.
  • Place knots in uniform way, e.g. quantity of data.
  • Use cross validation.

Smoothing splines

First choose a loss function:

If can be anything then we can find error 0 but would massively over fit. We could try minimizing subject to being smooth.

Smooth splines by minimising the following loss function

If we solve this then:

  1. It a piecewise cubic polynomicals with knots at unique values of ,
  2. It is continuous at the first and second derivatives at the knots,
  3. It is linear outside the boundary knots, This is a natural cublic spline but not the same as if we got it using the technique above. This is due to controls the level of shrinkage.

Choosing

  • controls the smoothness and hence the degrees of freedom.
  • As increasing from zero to infinity goes from a step function to a line. i.e., degrees of freedom from to 2.
  • Use cross validation.

Local regression

  • Choose a number of points to consider
  • For a value find points close to it using a Gaussian.
  • Weight points using the gaussian.
  • Apply linear regression to get a curve.
  • Then use this curve to make a preductions.

GAM

Here we apply different models for different variables.