Exercise 10        Name __________________        Score __/14

Document last modified: 

  1. (2) Draw a decision tree for the following even parity training set. 
    N X3 X2 X1 X0 Target
    1 F F F F F
    2 F F F T T
    3 F F T F T
    4 F F T T F
    5 F T F F T
    6 T F F F T
    7 T F T F F
    8 T F T T T
    9 T T F T T
    10 T T T F T
    11 T T T T F

2. (2) Calculate the information content of Target.
 

p(F) = 4/11, p(T) = 7/11

I(p(F),p(T))  = I(4/11, 7/11) =

                       = -4/11 log2(4/11) -7/11 log2(7/11)
                       = 0.945 bit

3. (2) Calculate the gain of X1.

X1 T F
Target T 3 4
Target F 3 1

Remainder(X1) = [6/11*I(3/6,3/6)+5/11*I(4/5,1/5)] = 0.873 bit

Gain(X1) = 1 - Remainder(X1) = 0.945 - 0.873 = 0.072 bit


4. (2) Calculate the information content of Target after selecting the F branch of X1.

p(F) = 1/5, p(T) = 4/5

I(p(F),p(T))  = I(1/5,4/5) =

                       = -1/5 log2(1/5) -4/5 log2(4/5)
                       = 0.722 bit

 

5. (2) Calculate the gain of X3 after selecting the F branch of X1.

X3 T F
Target T 2 2
Target F 0 1

Remainder(X1) = [2/5*I(1,0)+3/5*I(1/3,2/3)] = 0.551 bit

Gain(X1) = 0.722 - Remainder(X1) = 0.722 - 0.551 = 0.171 bit

6. (3) Verify your answers in 3-5 using Figure 18.5 of text. Draw the decision tree produced by ID3.

Test X1 0.0720566251064
 X1 = T ==> Test X0 0.0817041659455
     X0 = T ==> Test X3 0.251629167388
         X3 = T ==> Test X2 1.0
             X2 = T ==> RESULT =  F
             X2 = F ==> RESULT =  T
         X3 = F ==> RESULT =  F
     X0 = F ==> Test X3 0.251629167388
         X3 = T ==> Test X2 1.0
             X2 = T ==> RESULT =  T
             X2 = F ==> RESULT =  F
         X3 = F ==> RESULT =  T
 X1 = F ==> Test X3 0.170950594455
     X3 = T ==> RESULT =  T
     X3 = F ==> Test X2 0.251629167388
         X2 = T ==> RESULT =  T
         X2 = F ==> Test X0 1.0
             X0 = T ==> RESULT =  T
             X0 = F ==> RESULT =  F

7. (1) Is the prediction for X3=F, X2=T, X1=T, X0=T correct? No

         X0 though the highest had a low information gain (Question 3), what does that say about the predictive power of our decision tree?

The attribute in nearly irrelevant and the predictive ability is nearly nil. Multi-valued attributes, for example N has 11 values (1-11) and an information gain of 0.94 bits, are poor for prediction.

    Full results, for those missed all not in training examples, x below.

F F F F F
F F F T T
F F T F T
F F T T F
F T F F T
F T F T T x
F T T F T x
F T T T F x
T F F F T
T F F T T x
T F T F F
T F T T T
T T F F T x
T T F T T
T T T F T
T T T T F