Boosting

Boosting is an Ensemble learning method that uses multiple instances of another model to average them out to make a classifier that is better than any one individual one.

Suppose we are in the modelling framework where and we have a training set where . Boosting iteratively train classifiers based on how the past classifiers have performed. To do this before training we define a distribution on .

We set
To define the other terms lets introduce some notation.

Let be the error rate of with respect to

Set our learning rate

Other choices of can be made.

Now define iteratively by setting

Note that if and share the same sign then the power of is negative and we decrease the weight of whereas if they don’t we increase the weight of (generally there are some cases where this isn’t true!).

Once we have trained for some large number times we aggregate this all by defining our output with being the Sign function.

Pseudocode

Name(classifier, N, T):
	Input:
		classifier - some way to train a classifier given some training data and a distrubution.
		N - the number of itterations we want to do.
		T - pairs of points (a_t, b_t) to train on.
	Output:
		a moddel f on the space A to {-1, 1}.
1. Set D: T -> [0,1] to be D(t) = 1/|T|
2. For i = 1, ... , number_of_itterations do
	2.1. let h_i = classifier(T, D)
	2.2. Set alpha_i = calculate_learning_rate(h_i, T, D)
	2.2. Set D = Update_distrubution(h_i, T, D, alpha_i)
3. Return sgn(sum(alpha_t h_t(x) for t in T))
 
calculate_learning_rate(h, T, D):
	Input:
		h - A classifier mapping points in A to a real number 
		T - pairs of points (a_t, b_t) to train on.
		D - A distrubution over T
	Output:
		a learning rate for this itteration
1. Set e = 0
2. For each t in T
	2.1. If b_t != sgn(h(a_t)) then
		2.1.1 Set e = e + D(t)
3. Return ln((1-e)/e)/2
 
Update_distrubution(h, T, D, alpha)
	Input:
		h - A classifier mapping points in A to a real number 
		T - pairs of points (a_t, b_t) to train on.
		D - A distrubution over T
		alhpa - a learning rate
	Output:
		a new distrubution over T
1. For each t in T
	2.1. Set C_t = D(t) exp(- alpha b_t h(a_t))
2. Set Z = sum(C_t for t in T)
3. Intialise new distrubution over T D'
4. For each t in T
	4.1. Set D'(t) = C_t/Z
5. Return D'

Correctness