This is the data set for the restaurant example in this presentation.

## Contents |

## Entropy of the entire data set

Given a system P with probability distribution P(s) that the system would be in state s_{i} with probability p_{i}, the entropy H(P) is define as follows.

H(P) = -∑ p_{i}log p_{i}H(P) =

For the example data there are 12 items and two possible states: **T** and **F**. In this case half are **T** and half are **F**. The entropy is computed as follows.

H = - ( 6/12 * log_{2}(6/12) + 6/12 * log_{2}(6/12) ) = - 2 * (1/2 * log_{2}(1/2)) = - log_{2}(1/2) = log_{2}2 = 1

Since there is an equal chance of selecting a **T** or an **F**, we have no information about the data set. The entropy is at its maximal value of 1.

H(P) =

## First split

### Consider *Patrons* as the first splitting attribute

If we use *Patrons* to split the data set, there will be three subsets: **None, Some, **and** Full**.

Patrons:None:{F, F} The two elements withPatrons:Noneare bothF.Patrons:Some:{T, T, T, T} The four elements withPatrons:Someare allT.Patrons:Full:{F, T, F, F, F, T} The six elements withPatrons:Fullare halfTand halfF.

The entropy of these sets are respectively:

Patrons:None:- (0 * log_{2}0 + 1 * log_{2}1) = 0Patrons:Some:- (1 * log_{2}1 + 0 * log_{2}0) = 0Patrons:Full:- (0.333 * log_{2}(0.333) + 0.667 * log_{2}(0.667)) = 0.333 * 1.58 + 0.667 * 0.584 = 0.91

The entropy of the tree when divided according to *Patrons* is the sum of the entropies weighted by the size of each subtree.

2/12 * 0 + 4/12 * 0 + 6/12 * 0.91 =0.46

### Consider *Type* as the first splitting attribute

If we use *Type* to split the data set, there will be four subsets: **Burger, French, Italian, **and** Thai**.

Type:Burger:{T, F, F, T}Type:French:{T, F}Type:Italian:{T, F}Type:Thai:{F, T, T, F}

The entropy of these sets are respectively:

Type:Burger:- ( 2/4 * log_{2}(2/4) + 2/4 * log_{2}(2/4)) = 1Type:French:- ( 1/2 * log_{2}(1/2) + 1/2 * log_{2}(1/2)) = 1Type:Italian:- ( 1/2 * log_{2}(1/2) + 1/2 * log_{2}(1/2)) = 1Type:Thai:- ( 2/4 * log_{2}(2/4) + 2/4 * log_{2}(2/4)) = 1

In other words, splitting the entire data set on the *Type* attribute gives us no information at all! Each of the *Type* subsets itself has entropy 1. So the entropy of the resulting tree is still 1:

2/12 * 1 + 2/12 * 1 + 2/12 * 1 + 4/12 * 1 =1

### Consider *Hungry* as the first splitting attribute

If we use *Hungry* to split the data set, there will be two subsets: **True** and** False**.

Hungry:True:{T, F, T, T, T, F, T}Hungry:False:{T, F, F, F, F}

The entropy of these sets are respectively:

Hungry:True:- (5/7 * log_{2}(5/7) + 2/7 * log_{2}(2/7)) = 0.714 * 0.486 + 0.286 * 1.81 = 0.865Hungry:False:- (1/5 * log_{2}(1/5) + 4/5 * log_{2}(4/5) = 0.2 * 2.32 + 0.8 * 0.32 = 0.72

The entropy of the tree when divided according to *Hungry* is the sum of the entropies weighted by the size of each subtree.

7/12 * 0.865 + 5/12 * 0.72 =0.80

### First split conclusion

The other attributes also produce trees with higher entropy than *Patrons*

So the first split is on *Patrons*.

Since two of the subsets produced by splitting on *Patrons* are homogeneous, need split only on the third subset (**Patrons:Full**).

## Second split

### Consider *Hungry* as the splitting attribute after *Patrons*

Patrons:Full & Hungry:True= {F, T, F, T}Patrons:Full & Hungry:False= {F, F}

The entropy of these sets are respectively:

Patrons:Full & Hungry:True:1 (since the cases are evenly split)Patrons:Full & Hungry:False:0 (since the cases are all F)

The entropy of the *Patrons:Full* subtree when divided according to *Hungry* is the sum of the entropies weighted by the size of each subtree.

4/6 * 1 + 2/6 * 0 =0.667

### Consider *Rain* as the splitting attribute after *Patrons*

Patrons:Full & Rain:True= {F}Patrons:Full & Rain:False= {F, T, F, F, T}

The entropy of these sets are respectively:

Patrons:Full & Rain:True :0 (since there is only one case)Patrons:Full & Rain:False :- (2/5 * log_{2}(2/5) + 3/5 * log_{2}(3/5)) = 0.4 * 1.32 + 0.6 * 0.737 = 0.97

The entropy of the *Patrons:Full* subtree when divided according to *Rain* is the sum of the entropies weighted by the size of each subtree.

1/6 * 0 + 5/6 * 0.97 =0.808

### Consider *Wait Estimate* as the splitting attribute after *Patrons*

Patrons:Full & Wait Estimate:0-10= { }Patrons:Full & Wait Estimate:10-30= {T, F}Patrons:Full & Wait Estimate:30-60= {F, T}Patrons:Full & Wait Estimate:>60= {F, F}

The entropy of these sets are respectively:

Patrons:Full & Wait Estimate:0-10= 0Patrons:Full & Wait Estimate:10-30= 1Patrons:Full & Wait Estimate:30-60= 1Patrons:Full & Wait Estimate:>60= 0

The entropy of the *Patrons:Full* subtree when divided according to *Wait Estimate* is the sum of the entropies weighted by the size of each subtree.

0 * 0 + 2/6 * 1 + 2/6 * 1 + 2/6 * 0 =0.667

### Second split conclusion

It appears that *Hungry* and *Wait Estimate* are equally good attributes to use to split the cases where *Patrons:Full* and that none of the other attributes is any better.

Using *Wait Estimate* is a better choice since it produces subsets that can be each be split with a single attribute. Each of the two remaining non-empty subsets can be split on *Type*. Interestingly, though, the types are not consistent. In one case *Type:Thai* results in **T**; in the other case it results in **F**. That's why when *Type* is used to split the subset produced by **Hungry:True**, it doesn't produce homogeneous subsets.

Patrons:Full & Wait Estimate:10-30 & Type:Thai: {T}Patrons:Full & Wait Estimate:10-30 & Type:Italian: {F}

Patrons:Full & Wait Estimate:30-60 & Type:Thai: {F}Patrons:Full & Wait Estimate:30-60 & Type:Burger: {T}

*Hungry* produces subsets that can't be split with a single attribute. Only one subset (*Patrons:Full & Hungry:True*) need be split, but apparently no single attribute splits it into homogeneous sets. See the tree on slide 20. In other words, the combination of *Wait Estimate* and *Type* is a better splitter than the combination of *Hungry* and *Type* even though *Wait Estimate* and *Hungry* produce equally good results on their own from an entropy perspective.