Iterative Dichotomiser 3 (ID3)

This algorithm works on classification problems. It is a greedy algorithm to design a maximal decision tree. Therefore are looking for the tree to represent some function , where we may only have access to some training data .

This decision tree will be maximal with regards to the information entropy of each of the leaf nodes within that decision tree. For any we can define by considering drawing a random element in out with a probability of them being mapped to some element of .

To make this decision tree assume we have a set of attributes where that split the set up into subsets of variable length for . Let be the subsets of and define where

Lastly we will construct this tree using a depth-first search algorithm, where at each leaf node we will have a subset of our original training set.

Pseudocode

ID3(training_data, attributes):
	Input:
		training_data of the form {(a,f(a)) | a in A'}
		attributes mapping A into some finite set of states
	Output:
		a decision tree for the function f
1. Let A' be the set of points in A of the training data.
2. If Entropy(S) = 0 return a single leaf node with the one element of B all the training data has.
3. For each attribute t calculate Gain(A',t).
	1. If Gain(A',t) = 0 for all t return a signle leaf node with a maximally likely value in the training data.
	2. Otherwise let t' be a maximal element with respect to Gain(A', -)
4. Build rooted tree T with t' being the root node
5. For each j in 1, ..., n_t'
	1. Let A'_j = t'^{-1}(j) intersect A'
	2. Let T_j = ID3({(a,f(a)) | a in A'_j}, attributes without t)
	3. Attach the root of T_j to node t' with outcome j.
6. return decision tree T.

Run time

Correctness