This is the data set for the restaurant example in "Learning Decision Trees" (ppt) (Go to upload page).

Entropy of the entire data set

Given a system P with probability distribution P(s) that the system would be in state si with probability pi, the entropy H(P) is define as follows. $H(P) = - \sum_{i} p_i \,log\,p_i$

For the example data there are 12 items—each assumed to be equally likely—and two possible states: T and F. In this case half are T and half are F. The entropy is computed as follows. \begin{alignat}{5} H & = - ( \tfrac{6}{12} \times log_2 \,\tfrac{6}{12} + \tfrac{6}{12} \times log_2 \,\tfrac{6}{12} ) \\ & = - 2 \times (\tfrac{1}{2} \times log_2 \,\tfrac{1}{2})\\ & = - log_2 \,\tfrac{1}{2}\\ & = log_2 \,2 \\ & = 1 \end{alignat}

Since there is an equal chance of seeing a T or an F, we have no information about the data set. The entropy is at its maximal value of 1.

In other words, the information we want to know about a situation is whether to wait or not. That's one bit of information. Since there are 6 cases in which we wait and 6 cases in which we don't wait, neither is more likely than the other. Whether we wait or not is a full bit of information.

First split

Consider Patrons as the first splitting attribute

If we use Patrons to split the data set, there will be three subsets: None, Some, and Full.

Patrons:None: {F, F}                The two elements with Patrons:None are both F.
Patrons:Some: {T, T, T, T}          The four elements with Patrons:Some are all T.
Patrons:Full: {F, T, F, F, F, T}    The six elements with Patrons:Full are half T and half F.

The entropy of these sets are respectively:

Patrons:None: - (0 * log2 0 + 1 * log2 1) = 0
Patrons:Some: - (1 * log2 1 + 0 * log2 0) = 0
Patrons:Full: - (0.333 * log2 (0.333) + 0.667 * log2 (0.667)) = 0.333 * 1.58 + 0.667 * 0.584 = 0.91

The entropy of the tree when divided according to Patrons is the sum of the entropies weighted by the size of each subtree.

2/12 * 0 + 4/12 * 0 + 6/12 * 0.91 = 0.46

Consider Type as the first splitting attribute

If we use Type to split the data set, there will be four subsets: Burger, French, Italian, and Thai.

Type:Burger: {T, F, F, T}
Type:French: {T, F}
Type:Italian: {T, F}
Type:Thai: {F, T, T, F}

The entropy of these sets are respectively:

Type:Burger: - ( 2/4 * log2 (2/4) + 2/4 * log2 (2/4)) = 1
Type:French: - ( 1/2 * log2 (1/2) + 1/2 * log2 (1/2)) = 1
Type:Italian: - ( 1/2 * log2 (1/2) + 1/2 * log2 (1/2)) = 1
Type:Thai: - ( 2/4 * log2 (2/4) + 2/4 * log2 (2/4)) = 1

In other words, splitting the entire data set on the Type attribute gives us no information at all! Each of the Type subsets itself has entropy 1. So the entropy of the resulting tree is still 1:

2/12 * 1 + 2/12 * 1 + 2/12 * 1 + 4/12 * 1 = 1

Consider Hungry as the first splitting attribute

If we use Hungry to split the data set, there will be two subsets: True and False.

Hungry:True: {T, F, T, T, T, F, T}
Hungry:False: {T, F, F, F, F}

The entropy of these sets are respectively:

Hungry:True: - (5/7 * log2 (5/7) + 2/7 * log2 (2/7)) = 0.714 * 0.486 + 0.286 * 1.81 = 0.865
Hungry:False: - (1/5 * log2 (1/5) + 4/5 * log2 (4/5) = 0.2 * 2.32 + 0.8 * 0.32 = 0.72

The entropy of the tree when divided according to Hungry is the sum of the entropies weighted by the size of each subtree.

7/12 * 0.865 + 5/12 * 0.72 = 0.80

First split conclusion

The other attributes also produce trees with higher entropy than Patrons

So the first split is on Patrons.

Since two of the subsets produced by splitting on Patrons are homogeneous, we need split only on the third subset (Patrons: Full).

Second split

Consider Hungry as the splitting attribute after Patrons

Patrons:Full & Hungry:True = {F, T, F, T}
Patrons:Full & Hungry:False = {F, F}

The entropy of these sets are respectively:

Patrons:Full & Hungry:True: 1 (since the cases are evenly split)
Patrons:Full & Hungry:False: 0 (since the cases are all F)

The entropy of the Patrons:Full subtree when divided according to Hungry is the sum of the entropies weighted by the size of each subtree.

4/6 * 1 + 2/6 * 0 = 0.667

Consider Rain as the splitting attribute after Patrons

Patrons:Full & Rain:True = {F}
Patrons:Full & Rain:False = {F, T, F, F, T}

The entropy of these sets are respectively:

Patrons:Full & Rain:True : 0 (since there is only one case)
Patrons:Full & Rain:False : - (2/5 * log2 (2/5) + 3/5 * log2 (3/5)) = 0.4 * 1.32 + 0.6 * 0.737 = 0.97

The entropy of the Patrons:Full subtree when divided according to Rain is the sum of the entropies weighted by the size of each subtree.

1/6 * 0 + 5/6 * 0.97 = 0.808

Consider Wait Estimate as the splitting attribute after Patrons

Patrons:Full & Wait Estimate:0-10 = { }
Patrons:Full & Wait Estimate:10-30 = {T, F}
Patrons:Full & Wait Estimate:30-60 = {F, T}
Patrons:Full & Wait Estimate:>60 = {F, F}

The entropy of these sets are respectively:

Patrons:Full & Wait Estimate:0-10 = 0
Patrons:Full & Wait Estimate:10-30 = 1
Patrons:Full & Wait Estimate:30-60 = 1
Patrons:Full & Wait Estimate:>60 = 0

The entropy of the Patrons:Full subtree when divided according to Wait Estimate is the sum of the entropies weighted by the size of each subtree.

0 * 0 + 2/6 * 1 + 2/6 * 1 + 2/6 * 0 = 0.667

Second split conclusion

It appears that Hungry and Wait Estimate are equally good attributes to use to split the cases where Patrons:Full and that none of the other attributes is any better.

Using Wait Estimate is a better choice since it produces subsets that can be each be split with a single attribute. Each of the two remaining non-empty subsets can be split on Type. Interestingly, though, the types are not consistent. In one case Type:Thai results in T; in the other case it results in F. That's why when Type is used to split the subset produced by Hungry:True, it doesn't produce homogeneous subsets.

Patrons:Full & Wait Estimate:10-30 & Type:Thai: {T}
Patrons:Full & Wait Estimate:10-30 & Type:Italian: {F}
Patrons:Full & Wait Estimate:30-60 & Type:Thai: {F}
Patrons:Full & Wait Estimate:30-60 & Type:Burger: {T}

Hungry produces subsets that can't be split with a single attribute. Only one subset (Patrons:Full & Hungry:True) need be split, but apparently no single attribute splits it into homogeneous sets. See the tree on slide 20. In other words, the combination of Wait Estimate and Type is a better splitter than the combination of Hungry and Type even though Wait Estimate and Hungry produce equally good results on their own from an entropy perspective.