Expectation Maximisation

Expectation maximising is an algorithm which can be used on problems such as clustering problem. It is used in situations where you assume your data follows some underlying distribution which in this instance is given by parameters . We assume our hypothesis space.

Suppose we draw samples from which has an observed component and an unobserved component . Therefore the samples of are given by .

Once you are provided with you can treat the values of as random variables - similar with . Both and depend on the unknow parameters .

During the EM algorithm we will update our current hypothesis for the parameters , to a new hypothesis . We want to update by trying to maximise

Lets unpack this - we use our current hypotheses to guess at the values of . This gives us and therefore . Now we want to find the hypothesis (or parameters for ) that maximises the log probability that would occur. This follows a nice natural two step process.

  1. Given our current hypothesis calculate the values for the unobserved parameters, this gives us the function , and then
  2. Using the values of and for - find a hypothesis

If the function is continuous for all then we will converge to a local maximum however we don’t know how long this will take us or if this local maximum is a global maximum.

Correctness

  • We are monotonically decreasing Maximum likelihood error of the groupings.
  • Though this is not guaranteed to converge.
  • Though it will not diverge either!
  • It can get stuck in local optimums.
    • So it is best that we restart this algorithm and take the best outcome.