Single linkage clustering

Suppose we are in the clustering problem set up. Here we are given the number of desired clusters , and the sense of distance .

The idea will be to make every element in our training data its own cluster. Then we take the union of two clusters which are considered the closest. For this linkage clustering we use the minimum distance of points in the cluster. i.e.

where is the power set of . We keep iterating this until we are left with clusters.

Pseudocode

Name(k, d, T):
	Input:
		k - the number of clusters
		d - the a symetric distance on T
		T - training data
	Output:
		A function from f splitting T into k groups
1. Set f to be the identity function.
2. While the codain of f has size larger than k:
	1. Set C_a, C_b, min_distance to be empty 
	2. For add X, Y in f^{-1}(T):
		1. Caluclate dist = min_{x in X, y in Y} d(x,y)
		2. if dist < min_distance:
			1. Set C_a = X, C_b = Y
	3. Update f(C_b) = f(C_a)
3. Return f

Run time

If you implement this in the worst case it is but realistically you could get it to be .

Correctness

Whilst it solves the problem it can lead to counter intuitive answers as chains for very closely grouped points will cluster together even when by eye you might split them differently.