Exercise 10 Name __________________ Score __/14
Document last modified:| N | X3 | X2 | X1 | X0 | Target |
| 1 | F | F | F | F | F |
| 2 | F | F | F | T | T |
| 3 | F | F | T | F | T |
| 4 | F | F | T | T | F |
| 5 | F | T | F | F | T |
| 6 | T | F | F | F | T |
| 7 | T | F | T | F | F |
| 8 | T | F | T | T | T |
| 9 | T | T | F | T | T |
| 10 | T | T | T | F | T |
| 11 | T | T | T | T | F |
2. (2) Calculate the information content of Target.
p(F) = 4/11, p(T) = 7/11
I(p(F),p(T)) = I(4/11, 7/11) =
= -4/11 log2(4/11) -7/11 log2(7/11)
= 0.945 bit
3. (2) Calculate the gain of X1.
X1 T F Target T 3 4 Target F 3 1 Remainder(X1) = [6/11*I(3/6,3/6)+5/11*I(4/5,1/5)] = 0.873 bit
Gain(X1) = 1 - Remainder(X1) = 0.945 - 0.873 = 0.072 bit
4. (2) Calculate the information content of Target after selecting the F branch
of X1.
p(F) = 1/5, p(T) = 4/5
I(p(F),p(T)) = I(1/5,4/5) =
= -1/5 log2(1/5) -4/5 log2(4/5)
= 0.722 bit
5. (2) Calculate the gain of X3 after selecting the F branch of X1.
X3 T F Target T 2 2 Target F 0 1 Remainder(X1) = [2/5*I(1,0)+3/5*I(1/3,2/3)] = 0.551 bit
Gain(X1) = 0.722 - Remainder(X1) = 0.722 - 0.551 = 0.171 bit
6. (3) Verify your answers in 3-5 using Figure 18.5 of text. Draw the decision tree produced by ID3.
Test X1 0.0720566251064
X1 = T ==> Test X0 0.0817041659455
X0 = T ==> Test X3 0.251629167388
X3 = T ==> Test X2 1.0
X2 = T ==> RESULT = F
X2 = F ==> RESULT = T
X3 = F ==> RESULT = F
X0 = F ==> Test X3 0.251629167388
X3 = T ==> Test X2 1.0
X2 = T ==> RESULT = T
X2 = F ==> RESULT = F
X3 = F ==> RESULT = T
X1 = F ==> Test X3 0.170950594455
X3 = T ==> RESULT = T
X3 = F ==> Test X2 0.251629167388
X2 = T ==> RESULT = T
X2 = F ==> Test X0 1.0
X0 = T ==> RESULT = T
X0 = F ==> RESULT = F
7. (1) Is the prediction for X3=F, X2=T, X1=T, X0=T correct? No
X0 though the highest had a low information gain (Question 3), what does that say about the predictive power of our decision tree?
The attribute in nearly irrelevant and the predictive ability is nearly nil. Multi-valued attributes, for example N has 11 values (1-11) and an information gain of 0.94 bits, are poor for prediction.
Full results, for those missed all not in training examples, x below.
F F F F F
F F F T T
F F T F T
F F T T F
F T F F T
F T F T T x
F T T F T x
F T T T F x
T F F F T
T F F T T x
T F T F F
T F T T T
T T F F T x
T T F T T
T T T F T
T T T T F