Chapter 13
Uncertainty  

powered by FreeFind

Modified: 

13.1 ACTING UNDER UNCERTAINTY

 

 or P(die < 4) = P(1)+P(2)+P(3) = 1/2

 

What is probability of the event drawing an Ace from a deck of cards, P(Ace)?

 

 for a die numbers 1, 3 and 5.

 

Example

Lucky(Ace) = true

P(Lucky=true) = 1/52 + 1/52 + 1/52 + 1/52

 

13.2 BASIC PROBABILITY NOTATION

 

For Boolean random variables

Odd and LT4 (i.e. numbers less that 4)

What is:
  • LT4(5)
  • Ølt4
  • P(Ølt4)
  • odd Ú lt4
  • P(Odd Ú LT4)

 

 

P(A Ú B) = P(A) + P(B) - P(A Ù B)

P(A Ù B) subtracted because the intersection of P(A) and P(B) is included twice.

 

 

Example

Prior or unconditional probability of event, drawing an Ace of spades from a deck of cards is P(Ace of spades) = 1/52.

After drawing a card, P(Ace of spades) = 0 or 1

 

Example - Joint probability table:

P(A,B)

  a Øa
b 0.11 0.09
Øb 0.63 0.17
  • What is the prior probability P(b)?
  • Of Øa Ú b?

 

 

Example: P(A,B)

  a Øa
b 0.11 0.09
Øb 0.63 0.17
  • What is the conditional probability P(a|b)?

 

13.4 INFERENCE USING FULL JOINT DISTRIBUTIONS

 

Full joint distribution: P(Toothache, Cavity, Catch) represented in 2x2x2 table.

 

What is  P(cavity), the probability the proposition cavity is true?

 

What is  P(cavity Ù Øtoothache)?

 

What is  P(cavity | toothache)?

 

 

Normalization - the process whereby the posterior (conditional) probabilities of a pair of variables are divided by a fixed value to ensure the sum is 1. Useful shortcut in probability calculations because P(A) = 1 - P(ØA).

a of P(cavity, toothache) = 1 / P(toothache) = 1 / (.012 + .108 + .016 + .064) = 1 / .2 = 5

a of P(toothache, catch)

 

 

Inferencing by enumeration works but why is it often not practical?

 

13.5 INDEPENDENCE

 

  • B conditionally dependent on A

P(A Ù B) = P(B|A) * P(A)

 

  • Independent A and B

P(A Ù B) = P(B) * P(A)

  • Toothache depends on cavity
  • Catch depends on cavity
  • Weather is independent

P(Toothache|Cavity)

P(Catch|Cavity)

 

  • The full joint distribution has size 8, conditional independence reduces the size to 5. Use the definition of conditional probability to demonstrate.
  • Suppose we had 32 Boolean variables, what is the size of the full joint distributi

 

13.6 BAYES' RULE AND ITS USE

 

What does the above say? P(m|s)

Calculate P(s|m), given that one has meningitis, they have a stiff neck.

 

Example

  A ØA
B 0.11 0.09
ØB 0.63 0.17

Bayes' Rule

 

Example: Medical diagnosis

  HT ØHT
C 0.00008 0.00002
ØC 0.00092 0.99918
How is P(HT|C) calculated from the joint probability table?

Verify P(HT|C)=0.80

What is P(ØHT|ØC)?

 

BAYESIAN BELIEF NETWORKS

 

Bayesian belief network is represented as an acyclic directed graph.

 

Independent A and B

P(B Ù A) = P(A) * P(B)

and

P(B|A) = P(B)

that is the likelihood of B is unaffected by whether or not A occurs.

 

Dependent B on A

P(B Ù A) = P(A) * P(B|A)

P(B|A) P(B)

 

Serial P(A,V,B)=P(A)*P(V|A)*P(B|V)
Diverging P(V,A,B)=P(V)*P(A|V)*P(B|V)
Converging      P(A,B,V)=P(A)*P(B)*P(V|A,B)

 

Example - Bayesian belief network

  • P(A) = 0.1
  • P(B) = 0.7
  • P(C|A) = 0.2
  • P(C|ØA) = 0.4
  • P(D|AÙB) = 0.5
  • P(D|AÙØB) = 0.4
  • P(D|ØAÙB) = 0.2
  • P(D|ØAÙØB) = 0.0001
  • P(E|B)=0.2
  • P(E|ØB)=0.1
A has only prior probabilities since independent.
B has only prior probabilities since independent.
C dependent on A, 2 cases A and
ØA.

D dependent on A and B, 4 cases.



E dependent on B, 2 cases B and
ØB.

 

Expressed as conditional probability tables:

P(A)
0.1
 
P(B)
0.7
     
A P(C)
true
false
0.2
0.4
 
B P(E)
true
false
0.2
0.1
     
A B P(D)
true
true
false
false
true
false
true
false
0.5
0.4
0.2
0.0001
 
Given A and B are true, P(D) = 0.5
etc.

 

Joint probability using definition of conditional probability

Hence

P(A,B,C,D,E) = P(E|A,B,C,D)*P(A,B,C,D)

applying this rule recursively:

P(A,B,C,D,E) = P(E|A,B,C,D)*P(D|A,B,C)*P(C|A,B)*P(B|A)*P(A)

Observing that:

E is not dependent on A, C or D

P(E|A,B,C,D) = P(E|B)

C is dependent only on A

P(C|A,B) = P(C|A)

D is dependent only on A and B (A and B are independent)

P(D|A,B,C) = P(D|A,B)

                = P(A|D)*P(B|D)*P(D)
                          P(A
Ù B)

B is independent of A so

P(B|A) = P(B)

can reduce

P(A,B,C,D,E) = P(E|A,B,C,D)*P(D|A,B,C)*P(C|A,B)*P(B|A)*P(A)

                   = P(E|B)*P(D|A,B)*P(C|A)*P(B)*P(A)

 

Note that to calculate joint probability, the nodes must be ordered such that if a node X is dependent on node Y, Y appears before X. Either of the following would work:

P(B,A,C,D,E)

P(A,C,B,D,E)

 

Example

 

P(C)
0.2
   
     
C P(S)
true
false
0.8
0.2
 
C P(P)
true
false
0.6
0.5
     
S P P(E)
true
true
false
false
true
false
true
false
0.6
0.9
0.1
0.2
 
P P(F)
true
false
0.9
0.7


 

P(ØC) = 1 - P(C)

P(F|P) = 0.9 since P is true

P(E|SÙP) = 0.6 since S and P are true

What is the probability of:
  • having fun given that you party?
  • passing exams if you don't study and don't party?

P(C=true, S=true, P=false, E=true, F=false) or P(C,S,ØP,E,ØF)

the probability that you will go to college, study, not party, pass exams and not have fun!

P(C,S,ØP,E,ØF) = P(C)*P(S|C)*P(ØP|C)*P(E|SÙØP)*P(ØF|ØP)

                      = .2*.8*.4*.9*.3

                      = 0.01728

Bayesian belief networks, because no direct connection between C and E, E is independent of C, given S and P.

P(E|CÙSÙP) = P(E|SÙP)=0.6
 

Compute the probability of going to college, study, party, pass exams and have fun

Can calculate more complex conditional probabilities, for example pass Exams given that you:

P(E|FÙCÙSÙØP) = P(E|S Ù ØP) = 0.9

 

P(C)
0.2
   
     
C P(S)
true
false
0.8
0.2
 
C P(P)
true
false
0.6
0.5
     
S P P(E)
true
true
false
false
true
false
true
false
0.6
0.9
0.1
0.2
 
P P(F)
true
false
0.9
0.7


 

 

Diagnoses

Can make diagnoses by determining posterior probabilities.

Want to determine whether partied or not.

P(CÙSÙFÙEÙP) = P(C)*P(S|C)*P(P|C)*P(E|SÙP)*P(F|P)

                    = .2*.8*.6*.6*.9 = .05184

P(CÙSÙFÙEÙØP) = P(C)*P(S|C)*P(ØP|C)*P(E|SÙØP)*P(F|ØP)

                      = .2*.8*.4*.9*.7 = .04032

so more likely you partied.

 

NAIVE BAYES' CLASSIFIER

Training Example

x y z Classification
2 3 2 A
4 1 4 B
1 3 2 A
2 4 3 A
4 2 4 B
2 1 3 C
1 2 4 A
2 3 3 B
2 2 4 A
3 3 3 C
3 2 1 A
1 2 1 B
2 1 4 A
4 3 4 C
2 2 4 A

 

Summary classification table

A's - 8
B's - 4
C's - 3
      15 total
A
value x y z
1 2 1 1
2 5 4 2
3 1 2 1
4 0 1 4
B
value x y z
1 1 1 1
2 1 2 0
3 0 1 1
4 2 0 2
C
value x y z
1 0 1 0
2 1 0 0
3 1 2 2
4 1 0 1

 

Classify: (x=2, y=3, z=4)  

Use P(ci)*PP(dj|ci) to compute posterior probability of ci

P(A)*P(x=2|A)*P(y=3|A)*P(z=4|A) =
8/15*5/8        *2/8        *4/8         =0.0417 maximum

P(B)*P(x=2|B)*P(y=3|B)*P(z=4|B) =
4/15*1/4        *1/4        *2/4         =0.0083

P(C)*P(x=2|C)*P(y=3|C)*P(z=4|C) =
3/15*1/3        *2/3        *1/3         =0.015

Classify as A since maximum posterior probability

 

Summary classification table

A's - 8
B's - 4
C's - 3
      15 total
A
value x y z
1 2 1 1
2 5 4 2
3 1 2 1
4 0 1 4
B
value x y z
1 1 1 1
2 1 2 0
3 0 1 1
4 2 0 2
C
value x y z
1 0 1 0
2 1 0 0
3 1 2 2
4 1 0 1

 

Problems -  when no training data to calculate probability

Classify: (x=1, y=2, z=2)

P(A)*P(x=1|A)*P(y=2|A)*P(z=2|A) =
8/15*2/8        *2/8        *2/8         =0.0083

P(B)*P(x=1|B)*P(y=2|B)*P(z=2|B) =
4/15*1/4        *1/4        *0/4         = 0

P(C)*P(x=1|C)*P(y=2|C)*P(z=2|C) =
3/15*0/3        *0/3        *0/3         = 0

 

m-estimate - estimate probability of a specific attribute value given a specific classification.

a + mp
b+m

a = number of training examples that match attribute value (for P(x=1|C), a is the number of training examples where x=1 and categorized as C. In example, a=0)

b = total number of training examples categorized as C. In example, 3

p = estimate of probability trying to obtain. Usually assume each attribute value equally likely; in example with four values of x=1,2,3 or 4, for P(x=1|C), p=1/4; P(x=2|C), p=1/4, etc.

m = constant known as equivalent sample size.

 

Example

Calculate the m-estimate for x=1 given a classification of C; P(x=1|C).

Pick m = 5.

a + mp  = 0 + 5*1/4  = 0.156
b+m         3+5

 

Summary classification table

A's - 8
B's - 4
C's - 3
      15 total
A
value x y z
1 2 1 1
2 5 4 2
3 1 2 1
4 0 1 4
B
value x y z
1 1 1 1
2 1 2 0
3 0 1 1
4 2 0 2
C
value x y z
1 0 1 0
2 1 0 0
3 1 2 2
4 1 0 1

Classify: (x=1, y=2, z=2) Use P(ci)*PP(dj|ci)

Category A

x=1 y = 2 z = 2
2+5/4 = 0.25
8+5
3+5/4 = 0.33
8+5
1+5/4 = 0.17
8+5

P(A)*P(x=1|A)*P(y=2|A)*P(z=2|A) =
8/15*0.25      *0.33       *0.17        =0.0075

Category B

x=1 y = 2 z = 2
1+5/4 = 0.25
4+5
2+5/4 = 0.36
4+5
0+5/4 = 0.138
4+5

P(B)*P(x=1|B)*P(y=2|B)*P(z=2|B) =
4/15*0.25       *0.36      *0.138      = 0.0033

Category C

x=1 y = 2 z = 2
0+5/4 = 0.156
3+5
0+5/4 = 0.156
3+5
0+5/4 = 0.156
3+5

P(C)*P(x=1|C)*P(y=2|C)*P(z=2|C) =
3/15*0.156     *0.156     *0.156      = 0.0008

 

A was the correct classification after all.

 

13.7 WUMPUS WORLD REVISITED

 

 

Consider 3 cases below:

P1,3ÙP2,2ÙP3,1

P1,3ÙP2,2ÙØP3,1

P1,3ÙØP2,2ÙP3,1

Do not consider the following because not possible given what is known:

P1,3ÙØP2,2ÙØP3,1

 

P(P1,3=true|known,b) = 0.31

P(P1,3=false|known,b) = 0.0.69

 

Recall that summing over the fringe (P2,2 and P3,1) analogous to summing over a (hyper-dimensional) row of a joint distribution table, in this case where known is true and P1,3=false or true.

P(P1,3=true)=0.2   
  • P(P2,2=true)=0.2    P(P3,1=true)=0.2
  • P(P2,2=true)=0.2    P(P3,1=false)=0.8
  • P(P2,2=false)=0.2    P(P3,1=true)=0.8
P(P1,3=false)=0.8
  • P(P2,2=true)=0.2    P(P3,1=true)=0.2
  • P(P2,2=true)=0.2    P(P3,1=false)=0.8

Why are only P2,2 and P3,1 considered for P1,3=false?

Why can OTHER be excluded from the calculation of conditional probability for P1,3?

 

$"ÞÛºØÎÚÙ