MIMIC by dependency trees

This follows on from MIMIC (meta). Where we are going to use dependency trees to calculate the distribution. Here we are just implementing a version of the probability_constructor.

We assume the domain of our problem breaks up into features, we will also break up the random variable on this also. To generate the probability distribution we are going to build a dependency tree. To decide the form of this dependency tree - due to some maths - we will use provide pass or fail samples to calculate Mutual information for each pair of features.

To do this we need to calculate for all , , and for each . If we have samples of pass fails respectively this is simply

however there are more advanced technique you can apply here.

Now we apply the formula of Mutual information, Information entropy, and conditional entropy to get .

Then we construct a complete undirected graph on then weight the edge with . (Note Mutual information is symmetric so we can just use where .)

Then construct a MST on this graph and pick one node to be the root. This gives us our dependency tree. We can now use the probabilities calculated earlier to find the probabilities associated with this tree.

Once we have this Bayesian network we can use it to simulate new samples to evaluate on.

Run time

Correctness