Chapter 20
Statistical Learning  

powered by FreeFind

Modified: 

 

 

 

Example - classify OR operation

x0 x1 W0 W1 in a
0 0 0.5 0.5 0.0 0
0 1 0.5 0.5 0.5 1
1 0 0.5 0.5 0.5 1
1 1 0.5 0.5 1.0 1

 

 

 

Positive bias of W0 moves in higher toward threshold of 0.

# Figure 20-15 of AIMA from Russell and Norvig
# Simple feedforward network and demonstration of recognizing AND, OR and NOT
# using threshold activation function

 
def g(In) :                                                          # Threshold activation function
   if In > 0.0 : return 1
   else :          return 0

def PERCEPTRON_TEST(a, W) :                             # a is inputs, W is weights
   In = 0.0                                                          # One output
   for j in range(len(W)) :
      In = In + W[j]*a[j]
   return g(In)

print 'AND'               #   a[0, 1, 2] W[0, 1, 2]
print PERCEPTRON_TEST([-1, 0, 0], [1.5, 1, 1])    # 0
print PERCEPTRON_TEST([-1, 0, 1], [1.5, 1, 1])    # 0  
print PERCEPTRON_TEST([-1, 1, 0], [1.5, 1, 1])    # 0
print PERCEPTRON_TEST([-1, 1, 1], [1.5, 1, 1])    # 1

print 'NOT'
print PERCEPTRON_TEST([-1, 0], [-0.5, -1])         # 1
print PERCEPTRON_TEST([-1, 1], [-0.5, -1])         # 0

Give a similar definition for OR operation from the illustration above.

 

 

 

 

 

Example - classify AND operation

x0 x1 W0 W1 in a
0 0 0.5 0.5 0.0 0
0 1 0.5 0.5 0.5 1
1 0 0.5 0.5 0.5 1
1 1 0.5 0.5 1.0 1

Error when output overshoots correct answer.

Can adjust:

Adding a bias of -0.75 gives:

x0 x1 W0 W1 Wbias in a
0 0 0.5 0.5 -0.75 -0.75 0
0 1 0.5 0.5 -0.75 -0.25 0
1 0 0.5 0.5 -0.75 -0.25 0
1 1 0.5 0.5 -0.75 0.25 1

 

 

Learning

W[j] = W[j] + (alpha * Err * x[j])

Example

# Figure 20.21 from AIMA by Russell and Norvig
# using threshold activation function

def PERCEPTRON_LEARNING(examples, network) :
   alpha = 0.2
   W = network
   for epoch in range(3) :
      for e in examples :
         (x, y) = e
         sum = 0.0
         for j in range(len(x)) :
            sum = sum + W[j]*x[j]                         # Compute activation output
         Err = y - g( sum )
         for j in range(len(x)) :                              # Learn by updating weights
            W[j] = W[j] + (alpha * Err * x[j])           # based on error
   return W

def PERCEPTRON_TEST(examples, network) :
    W = network
    result = []
    for e in examples :
      (x, _) = e
      In = 0.0
      for j in range(len(x)) :
         In = In + W[j]*x[j]
      result.append(g(In))
    return result

def g(In) :                                                          # Threshold activation function
   if In > 0.0 : return 1                                         # Ex. In > 0.0 for OR
   else :          return 0                                         #      In > .75 for AND

OR = [ ([0,0], 0),
          ([0,1], 1),
          ([1,0], 1),
          ([1,1], 1)]

NN = [-0.2, 0.4]

print PERCEPTRON_LEARNING(OR, NN) # [0.200000000001, 0.40000000002]
print OR                                             # [([0, 0], 0), ([0, 1], 1), ([1, 0], 1), ([1, 1], 1)]
print PERCEPTRON_TEST(OR, NN)         # [0, 1, 1, 1]
  • Give the definition of the AND training set.
  • Try different values of the threshold cutoff in function g. How does that affect learning?

 

Example - Learning OR Boolean operation

sum=sum+W[j]*x[j]

What is the next activation?
Sample learning OR operation
epoch x0 x1 Expected
Y
Actual
Y
Error W0 W1
0 0 0 0 0 0 -0.2 0.4
0 0 1 1 1 0 -0.2 0.4
0 1 0 1 0 1 0.0 0.4
0 1 1 1 1 0 0.0 0.4
1 0 0 0 0 0 0.0 0.4
1 0 1 1 1 0 0.0 0.4
1 1 0 1 0 1 0.2 0.4
1 1 1 1 1 0 0.2 0.4
2 0 0 0 0 0 0.2 0.4
2 0 1 1 1 0 0.2 0.4
2 1 0 1 1 0 0.2 0.4
2 1 1 1 1 0 0.2 0.4

 

Learning

Problem

          X = w1I1+w2I2

           Y = 1 for X > t, 0 for X <= t

 

Problem

Back Propagation

Training Algorithm

            for j in range(len(ah)) :               # activate hidden layer
                ini = 0.0
                for k in range(len(ai)) :
                    ini = ini+Wih[j][k]*ai[k]
                ah[j] = g(ini)                           # hidden activation
                
            for i in range(len(ao)) :                 # activate output layer
                ini = 0.0
                for j in range(len(ah)) :           
                    ini = ini+Who[i][j]*ah[j]
                ao[i] = g(ini)                            # output activation
                deltao[i] = gp(ao[i])*(y[i]-ao[i])  # output error gradient
                           
            for k in range(len(ah)) :
                error = 0.0
                for j in range(len(y)) :             # assign feedback
                    error = error + Who[j][k]*deltao[j]
                deltah[k] = gp(ah[k]) * error   # hidden error gradient

            for j in range(len(ah)) :
                for i in range(len(ao)) :           # update output weights
                    Who[i][j] = Who[i][j] + (alpha * ah[j] * deltao[i])

            for k in range(len(ai)) :
                for j in range(len(ah)) :           # update hidden weights
                    Wih[j][k] = Wih[j][k] + (alpha * ai[k] * deltah[j])

Example

def g(x):  
    return math.tanh(x) 

 

# Figure 20.25 from AIMA by Russell and Norvig. Back-propagation Neural Net

import math

def g(x):                             # tanh faster than the standard 1/(1+e^-x)
    return math.tanh(x)

def gp(y):                                             # derivative of g
    return 1.0-y*y

def BACK_PROP_LEARNING(examples, network) :
    alpha = 0.2
    (Wih, Who) = network
    for epoch in range(4000) :
        for (x,y) in examples :
            ai=x[:]                                      # inputs/outputs
            deltao = [0.0]*len(y)
            deltah = [0.0]*len(Who[0])
            ah = [0.0]*len(Who[0])
            ao = [0.0]*len(y)

            for j in range(len(ah)) :               # activate hidden layer
                ini = 0.0
                for k in range(len(ai)) :
                    ini = ini+Wih[j][k]*ai[k]
                ah[j] = g(ini)                           # hidden activation
               
            for i in range(len(ao)) :                 # activate output layer
                ini = 0.0
                for j in range(len(ah)) :           
                    ini = ini+Who[i][j]*ah[j]
                ao[i] = g(ini)                            # output activation
                deltao[i] = gp(ao[i])*(y[i]-ao[i])  # output error gradient
                           
            for k in range(len(ah)) :
                error = 0.0
                for j in range(len(y)) :             # back propagate to hidden
                    error = error + Who[j][k]*deltao[j]
                deltah[k] = gp(ah[k]) * error   # hidden error gradient

            for j in range(len(ah)) :
                for i in range(len(ao)) :           # update output weights
                    Who[i][j] = Who[i][j] + (alpha * ah[j] * deltao[i])

            for k in range(len(ai)) :
                for j in range(len(ah)) :           # update hidden weights
                    Wih[j][k] = Wih[j][k] + (alpha * ai[k] * deltah[j])

    return network

def BACK_PROP_TEST(examples, network) :
    result = []
    (Wih, Who) = network
    for (x,y) in examples :
        eresult=[]
        ai=x[:]                                           # inputs/outputs
        ah = [0.0]*len(Who[0])
            
        for j in range(len(ah)) :                    # activate hidden layer
            ini = 0.0
            for i in range(len(ai)) :           
                ini = ini+Wih[j][i]*ai[i] 
            ah[j] = g(ini)                               # hidden activation
                
        for k in range(len(y)) :                      # activate output layer
            ini = 0.0
            for j in range(len(ah)) :           
                ini = ini+Who[k][j]*ah[j]

            eresult.append(g(ini))                   # output activation
        result.append(eresult)
    return result
 
XOR = [ ([0,0], [0]),                                 # Training examples
            ([0,1], [1]), 
            ([1,0], [1]), 
            ([1,1], [0])] 
 
NN = [[[0.1, -0.2],                                   # input to hidden weights
           [-0.3, 0.4]],                                  # 2 input and 2 hidden
          [[0.5, -0.6]]                                   # hidden to output weights
        ]                                                     # 2 hidden and 1 output
 
print 'Learning results: ', BACK_PROP_LEARNING(XOR, NN) 
print 'Training and test set: ', XOR 
print 'Test results: ', BACK_PROP_TEST(XOR, NN)
The illustration at right shows the neural net node values (circled) and initial assigned weights after one activation and no learning. The given input is [1,0] for the XOR operation, the calculated output is about 0.1 but 1.0 is correct.

NN = [[[0.1, -0.2],                  # input to hidden weights
           [-0.3, 0.4]],                 # 2 input and 2 hidden
          [[0.5], [-0.6]]               # hidden to output weights
        ]        

Before training, inputs elicited outputs:

 Calculated outputs for XOR input test
 ([0, 0], [0]) -> 0.0
 ([0, 1], [1]) -> -0.35715898221760423
 ([1, 0], [1]) -> 0.1666890968523167
 ([1, 1], [0]) -> -0.21376454980843815

After 1000 training epochs:

NN = [[[0.713, -3.32],            # input to hidden weights
           [0.708, -3.27]],          # 2 input and 2 hidden
          [[-6.47, -5.89]]           # hidden to output weights
        ]        
 

After learning the XOR training set, the [1,0] input now elicits a 0.95 output that more closely approximates the XOR function behavior.

Calculated outputs for XOR input test
 ([0, 0], [0]) -> 0.0
 ([0, 1], [1]) -> 0.95856726189074382
 ([1, 0], [1]) -> 0.95708877361026945
 ([1, 1], [0]) -> 0.12976883523533156

For the initial data:

  • modify the starting weights
  • change the alpha (learning rate) in the program to learn slower and faster
  • add another hidden layer node
  • Creating an algorithm to memorize perfect inputs and produce correct output is not difficult, known as a database.
  • One favorable characteristic of NNs is the ability to classify noisy inputs, imperfect data containing errors.
  • For example, after training the NN on correct XOR function input and output, approximate inputs are presented with the result output:

    Noisy input, correct output and calculated outputs for XOR input test
     ([0, 0],      [0]) -> 0.0
     ([0,.75],    [1]) -> 0.99021728739962134
     ([.85,0],    [1]) -> 0.98180321780465274
     ([.95,.95], [0]) -> 0.22831311312614366

  • The NN is able to classify some noisy inputs correctly.
  • The output results were closer to the correct value in two of the cases where the input had error.

 

Example learning handwriting (if you write in 7-segment display)

Networks can be trained to recognize handwriting, the following trains a network to recognize digits 1, 2, 3  given the representative segments. For example, the digit 2 has all segments except c and f ON. The input corresponding to digit 2 is [1a,1b,0c,1d,1e,0f,1g] and example output is [0,1,0] meaning node 2 should produce the greatest output when the network is given [1,1,0,1,1,0,1] as input.

After training the seven segment input for digits 1, 2 and 3 produces the output:

                      1                                         2                             3

[[0.98017386132142537, -0.0041815462717713298, 0.0045408480990941661],
 [-0.018064341191439331, 0.98439571626010991, 0.031736525224963216],
 [0.0027542312128823383, -0.010894250161767977, 0.97973528278179911]
]

illustrating the network correctly classified the inputs to the corresponding digits.

Below is the problem definition having:

The network at right illustrates input for the digit 1.

Note that only one hidden node is shown with connections.

 

 

seven_segment = [
    ([0,1,1,0,0,0,0], [1,0,0]),    # 1 digit example
    ([1,1,0,1,1,0,1], [0,1,0]),    # 2
    ([1,1,1,1,0,0,1], [0,0,1])     # 3
    ]

NN = [[[0.1, -0.2,  0.1, -0.2,  0.1, -0.2,   0.3],     # input to hidden weights
       [-0.3, 0.1, -0.2,  0.1, -0.2,  0.4,  -0.1],
       [-0.3, 0.1, -0.2,  0.1, -0.2,  0.4,  -0.1],
       [-0.3, 0.1, -0.2,  0.1, -0.2,  0.4,  -0.1]
       ],                                                              # hidden to output weights
      [[0.5, -0.2, -0.1, 0.3], [-0.6, 0.5, -0.2, 0.3], [0.1, -0.2, 0.1, -0.4]]
      ]

print BACK_PROP_LEARNING(seven_segment, NN)
print seven_segment
print BACK_PROP_TEST(seven_segment, NN)