Naive Bayes classifier

Suppose we have a classification problem and we are in the modelling framework. We are given features where each feature has finite values (also split up the random variable into for our input). Let our classification also have finite domain .

We can use our testing data to estimate for each and for each and .

Assumption

Making the assumption that and each of the form a Bayesian network where is the root with one arrow to each of the with the probabilities above.

Using Bayes rule then marginalisation and chain rule we can calculate

with the normalisation term . Then to compute the Maximum a posteriori probability we don’t need to compute and we let

This exactly forms our Naive Bayes classifier for this problem.

Run time

It is eager learner so requires you to sample the data initially. Though we only ever need to see each data point once - so is fairly fast. The estimating of the parameters is fairly simple. This model also has the minimum number of parameters you need to factor in most

Correctness

Whilst this does may a large assumption about the different attributes. Empirically this is considered very successful. It is thought that Google use this a lot in gmail for actual spam!

Limitations

  • It makes a strong assumption that the attributes are not related to one another, which seems ridiculous.
    • It is very venerable to co-dependent variables as its influence will grow exponentially the more variables are related. (i.e. if 3 variables were the same we would cube that term.)
    • Though in practice it turns out not to matter that much!
  • If any of the is zero, it zeros out the entire product.
    • We should account for this by always making these values non-zero. This is called “smoothing” of the probabilities.
    • We need a very large amount of data to guarantee our probabilities are accurate.