Single linkage clustering
Suppose we are in the clustering problem set up. Here we are given the number of desired clusters
The idea will be to make every element in our training data
where
Pseudocode
Name(k, d, T):
Input:
k - the number of clusters
d - the a symetric distance on T
T - training data
Output:
A function from f splitting T into k groups
1. Set f to be the identity function.
2. While the codain of f has size larger than k:
1. Set C_a, C_b, min_distance to be empty
2. For add X, Y in f^{-1}(T):
1. Caluclate dist = min_{x in X, y in Y} d(x,y)
2. if dist < min_distance:
1. Set C_a = X, C_b = Y
3. Update f(C_b) = f(C_a)
3. Return f
Run time
If you implement this in the worst case it is
Correctness
Whilst it solves the problem it can lead to counter intuitive answers as chains for very closely grouped points will cluster together even when by eye you might split them differently.