Naive Bayes classifier
Suppose we have a classification problem and we are in the modelling framework. We are given features
We can use our testing data to estimate
Assumption
Making the assumption that
and each of the form a Bayesian network where is the root with one arrow to each of the with the probabilities above.
Using Bayes rule then marginalisation and chain rule we can calculate
with the normalisation term
This exactly forms our Naive Bayes classifier for this problem.
Run time
It is eager learner so requires you to sample the data initially. Though we only ever need to see each data point once - so is fairly fast. The estimating of the parameters is fairly simple. This model also has the minimum number of parameters you need to factor in most
Correctness
Whilst this does may a large assumption about the different attributes. Empirically this is considered very successful. It is thought that Google use this a lot in gmail for actual spam!
Limitations
- It makes a strong assumption that the attributes are not related to one another, which seems ridiculous.
- It is very venerable to co-dependent variables as its influence will grow exponentially the more variables are related. (i.e. if 3 variables were the same we would cube that term.)
- Though in practice it turns out not to matter that much!
- If any of the
is zero, it zeros out the entire product. - We should account for this by always making these values non-zero. This is called “smoothing” of the probabilities.
- We need a very large amount of data to guarantee our probabilities are accurate.