Chapter 18
Learning from Observations  

powered by FreeFind

Modified: 

 

18.1 FORMS OF LEARNING

 

Chapter will cover learning methods for propositional logic.

 

18.2 INDUCTIVE LEARNING

 

input - x

output - f(x)

example - A pair (x, f(x))

h - Hypothesis

H - Hypothesis space, for example a set of polynomials to consider in fitting a function to data points.

Inductive learning - Given a collection of examples of f, return a function h that approximates f.

 

 

h = linear hypothesis, does not agree with f but simple and good prediction in most cases

 

h = nonlinear hypothesis, better agreement with f and good prediction in most cases

 

h = polynomial hypothesis, exact agreement with f but what about prediction

 

h = random hypothesis, precise agreement with f but poor prediction

 

 

 

18.3 LEARNING DECISION TREES - ID3

Example - decide whether to wait on a table at a restaurant

Decision tree induction terms

 

 

Decision tree induction operation

What is the will wait decision for the following attributes:
  1. Patrons some?
  2. Raining yes, Hungry yes, Alternate yes, Patrons Full, Wait estimate 10-30?
  3. Raining yes, Hungry yes, Alternate yes, Patrons Full, Wait estimate 30-60?

Represent 2 in propositional logic.

Example internal nodes

Leaf nodes

 

Each row corresponds to a path in the tree.

While in propositional logic:

A xor B Û (A Ù ØB) Ú (ØA Ù B)

corresponds to only those paths from root to a leaf with a positive outcome.

 

For the AND operation:
  • Give the truth table.
  • How many other ways could the truth table be given?
  • How many attributes?
  • How many different decisions (leaf nodes) are there?
  • Give the decision tree.
  • Can it be represented more compactly (fewer nodes) and still reach the same decisions?
  • Can XOR be represented more compactly (fewer nodes) and still reach the same decisions?

Can a program simply memorize all possible examples and use to answer questions?

How many different restaurant examples are needed to provide one for all possible decisions?

Can a decision tree provide a prediction when there is no value given to select a path at an internal node?

Can a decision tree be constructed that needs only a subset of possible decision paths?

 

Best to choose an attribute that provides greatest amount of classification information.

Patrons gives classification information for values None (negative) and Some (positive).

Type gives no classification information.

 

 

Information in a message

I(P(v1),...P(vn)) = -P(vi)log2(P(vi)) = number of bits to represent the answer

I(fair coin)  = -p(heads)log2(p(heads)) - p(tails)log2(p(tails))
                 = -1/2 log2(1/2) - 1/2 log2(1/2)
                 = -1/2 log22-1 - 1/2 log22-1
                 = -1/2(-1) -1/2(-1)
                 = 1

Assuming equal probability of restaurant Will Wait examples:

p(yes) = 6/12, p(no) = 6/12

I(p(yes),p(no))  = I(1/2, 1/2) =

                       = -1/2 log2(1/2) -1/2 log2(1/2)
                       = -1/2 * -1 -1/2 *-1
                       = 1 bit

 

Choosing the right attribute

Choose attribute that gives most information. Determine amount of information by calculating remainder information after testing an attribute.

Usually no attribute will give complete information (e.g. 1 bit) but will give some.

Any attribute A divides training examples E into subset according to values of attribute A (Patrons=2 None, 4 Some, 6 Full).

Each subset Ei has pi positive and ni negative examples so need additional I(pi/pi+ni,ni/pi+ni) bits of information

Random example has ith value of A with probability (pi+ni/p+n) so on average need, after testing A, the remaining bits of information:

Remainder(A) = S (pi+ni/p+n) I(pi/pi+ni,ni/pi+ni)

Gain is difference between number of bits originally required and the remaining bit requirement.

Gain(A) = I(pi/pi+ni,ni/pi+ni)  - Remainder(A)

Patrons
Will Wait None Some Full
Yes 0 4 2
No 2 0 4

Remainder(Patrons) =
       [2/12*I(0,1)+4/12*I(1,0)+6/12*I(2/6,4/6)] = 0.46 bit

Gain(Patrons) = 1 - Remainder(Patrons) = 1 - 0.46 = 0.54 bit


Price

Will Wait $ $$ $$$
Yes 3 2 1
No 4 0 2

Remainder(Price) =
       [7/12*I(3/7, 4/7)+2/12*I(2,0)+3/12*I(1/3,2/3)] = 0.81 bit

Gain(Price) = 1 - Remainder(Price) = 1 - 0.81 = 0.19 bit

Original decision tree

Results from Decision Tree Learning

Gain Patrons greatest: 0.54 bits

After classifying on Patrons

Hungry, Price, Rain, Type, Fri/Sat tie with: 0.25 bits

After classifying on Hungry

Type: 0.5 bits

After classifying on Type

Fri: 1.0 bits

Decision learning tree results

Test Patrons 0.540852082973
  Patrons = NONE ==> RESULT =  No
  Patrons = FULL ==> Test Hungry 0.251629167388
     Hungry = Yes ==> Test Type 0.5
         Type = Burger ==> RESULT =  Yes
         Type = Thai ==> Test Fri 1.0
             Fri = Yes ==> RESULT =  Yes
             Fri = No ==> RESULT =  No
         Type = Italian ==> RESULT =  No
     Hungry = No ==> RESULT =  No
  Patrons = SOME ==> RESULT =  Yes

Prediction:

Hungry:Yes, Patrons:FULL, Type:Thai, Fri:Yes => Yes

 

Program computation of attribute information in bits

Round 1 2 Patrons: Full 3 Hungry:Yes 4 Type: Thai
  Alternate   0.0
Bar   0.0
Fri   0.020
Hungry   0.195
Patrons   0.54
Price   0.195
Rain   0.02
Reservation   0.02
Type   0.0
Estimate   0.207
Alternate 0.109
Bar 0.0
Fri 0.109
Hungry 0.251
Price 0.251
Rain 0.044
Reservation 0.251
Type 0.251
Estimate   0.251
Alternate 0.0
Bar 0.0
Fri 0.311
Price 0.311
Rain 0.311
Reservation 0.311
Type 0.5
Estimate  0.0
Alternate 0.0
Bar 0.0
Fri 1.0
Price  0.0
Rain  1.0
Reservation 0.0
Estimate  1.0

What is the will wait decision for the following attributes:
  1. Patrons Some, Raining yes, Hungry yes, Alternate yes, Type Italian, Wait estimate 10-30?
  2. Raining yes, Hungry yes, Alternate yes, Type Thai, Patrons Full, Fri yes, Wait estimate 10-30?

 

Example (Download program and example)

 
Credit History Debt Collateral Income Risk
1 BAD HIGH NONE $0 to $15K HIGH
2 UNKNOWN HIGH NONE $15 to $35K HIGH
3 UNKNOWN LOW NONE $15 to $35K MODERATE
4 UNKNOWN LOW NONE $0 to $15K HIGH
5 UNKNOWN LOW NONE over $35K LOW
6 UNKNOWN LOW ADEQUATE over $35K LOW
7 BAD LOW NONE $0 to $15K HIGH
8 BAD LOW ADEQUATE over $35K MODERATE
9 GOOD LOW NONE over $35K LOW
10 GOOD HIGH ADEQUATE over $35K LOW
11 GOOD HIGH NONE $0 to $15K HIGH
12 GOOD HIGH NONE $15 to $35K MODERATE
13 GOOD HIGH NONE over $35K LOW
14 BAD HIGH NONE $15 to $35K HIGH

 

Decision learning tree data

H,M,L='HIGH','MODERATE','LOW'
B,U,G='BAD','UNKNOWN','GOOD'
A,NONE='ADEQUATE','NONE'

creditAttributes = ['Credit History', 'Debt', 'Collateral', 'Income']
creditExamples = [
     ([U,H,NONE,'$15 to $35K'],H),
     ([U,L,NONE,'$15 to $35K'],M),
     ([U,L,NONE,'$0 to $15K'], H),
     ([U,L,NONE,'over $35K'],  L),
     ([U,L,A,   'over $35K'],  L),
     ([B,H,NONE,'$0 to $15K'], H),
     ([B,L,NONE,'$0 to $15K'], H),
     ([B,L,A,   'over $35K'],  M),
     ([B,H,NONE,'$15 to $35K'],H),
     ([G,L,NONE,'over $35K'],  L),
     ([G,H,A,   'over $35K'],  L),
     ([G,H,NONE,'$0 to $15K'], H),
     ([G,H,NONE,'$15 to $35K'],M),
     ([G,H,NONE,'over $35K']  ,L)
    ]
creditDecisionTree = DECISION_TREE_LEARNING(creditExamples,creditAttributes, None)
creditDecisionTree.display()                                                                # print decision tree

question = {'Credit History':U, 'Debt':H, 'Collateral':NONE, 'Income':'$15 to $35K'}
print 'Question: ', question
print 'Prediction: Risk ', creditDecisionTree.predict(question)                  # print prediction

 

Decision Learning Tree results

 Test Income 0.966323671285
 Income = over $35K -> Test Credit History 0.650022421648
     Credit History = UNKNOWN -> RESULT =  LOW
     Credit History = BAD => RESULT =  MODERATE
     Credit History = GOOD -> RESULT =  LOW
 Income = $0 to $15K -> RESULT =  HIGH
 Income = $15 to $35K -> Test Credit History 0.5
     Credit History = UNKNOWN -> Test Debt 1.0
         Debt = HIGH -> RESULT =  HIGH
         Debt = LOW -> RESULT =  MODERATE
     Credit History = BAD -> RESULT =  HIGH
     Credit History = GOOD -> RESULT =  MODERATE

Prediction
  Collateral: NONE, Debt: HIGH, Credit History: UNKNOWN,
  Income: $15 to $35K -> Risk: HIGH

 

Program computation of attribute information in bits

Round 1 2 Income: $15 to $35K 3 Income: Over $35K
  Credit History 0.265
Debt 0.062
Collateral 0.206
Income 0.966



 
Credit History 0.5
Debt 0.311
Collateral 0.0

 

Credit History: Unknown
Debt 1.0
Collateral 0.0
Credit History 0.65
Debt 0.109
Collateral 0.19
 

 

Credit Risk Assessment

What is the Risk decision prediction for the following attributes:
  1. Income $0 to $15K?
  2. Income $15 to $35K, Credit History Unknown, Debt Low?

Does Collateral attribute contribute any information gain?

 

Information Theoretic Test Selection

 
Credit History Debt Collateral Income Risk
1 BAD HIGH NONE $0 to $15K HIGH
2 UNKNOWN HIGH NONE $15 to $35K HIGH
3 UNKNOWN LOW NONE $15 to $35K MODERATE
4 UNKNOWN LOW NONE $0 to $15K HIGH
5 UNKNOWN LOW NONE over $35K LOW
6 UNKNOWN LOW ADEQUATE over $35K LOW
7 BAD LOW NONE $0 to $15K HIGH
8 BAD LOW ADEQUATE over $35K MODERATE
9 GOOD LOW NONE over $35K LOW
10 GOOD HIGH ADEQUATE over $35K LOW
11 GOOD HIGH NONE $0 to $15K HIGH
12 GOOD HIGH NONE $15 to $35K MODERATE
13 GOOD HIGH NONE over $35K LOW
14 BAD HIGH NONE $15 to $35K HIGH

Information in a message

I(P(v1),...P(vn)) = -P(vi)log2(P(vi)) = number of bits to represent the message

Assuming equal probability of Risk examples:

p(HIGH RISK) = 6/14, p(MODERATE RISK) = 3/14, p(LOW RISK) = 5/14

I(Risk)  = -6/14 log2(6/14) -3/14 log2(3/14) -5/14 log2(5/14)
           = -6/14 * -1.222 -3/14 *-2.222 -5/14 * -1.485
           = 1.531 bits

Remainder(A) = S (pi+ni/p+n) I(pi/pi+ni,ni/pi+ni)

Gain(A) = I(pi/pi+ni,ni/pi+ni)  - Remainder(A)

Income
Risk $0 to $15K $15K to $35K over $35K
High 4 2 0
Medium 0 2 1
Low 0 0 5

Remainder(Income) =
       [4/14*I(4/4,0,0)+4/14*I(2/4,2/4,0)+6/14*I(1/6,5/6)] = 0.564 bit

Gain(Income) = 1.531 - Remainder(Income) = 1.531 - 0.564 = 0.966 bit

The gain on selecting Credit History first is about 0.265.

Give the formula to compute the information gain on Credit History.