Statement

Lemma

Suppose we have some target where the training data has i.i.d. normally distributed noise values such that we have . Then finding the maximum likelihood estimation is the same as minimising the Mean squared error (MSE), i.e.

Proof

This comes from direct calculation.