Chapter 20
|



Example - classify OR operation
- 2 inputs x0 and x1
- 2 weights W0 and W1
- 1 output a
x0 x1 W0 W1 in a 0 0 0.5 0.5 0.0 0 0 1 0.5 0.5 0.5 1 1 0 0.5 0.5 0.5 1 1 1 0.5 0.5 1.0 1


Positive bias of W0 moves in higher toward threshold of 0.
# Figure 20-15 of AIMA from Russell and Norvig
# Simple feedforward network and demonstration of recognizing AND, OR and NOT
# using threshold activation function
def g(In) : # Threshold activation function
if In > 0.0 : return 1
else : return 0
def PERCEPTRON_TEST(a, W) : # a is inputs, W is weights
In = 0.0 # One output
for j in range(len(W)) :
In = In + W[j]*a[j]
return g(In)
print 'AND' # a[0, 1, 2] W[0, 1, 2]
print PERCEPTRON_TEST([-1, 0, 0], [1.5, 1, 1]) # 0
print PERCEPTRON_TEST([-1, 0, 1], [1.5, 1, 1]) # 0
print PERCEPTRON_TEST([-1, 1, 0], [1.5, 1, 1]) # 0
print PERCEPTRON_TEST([-1, 1, 1], [1.5, 1, 1]) # 1
print 'NOT'
print PERCEPTRON_TEST([-1, 0], [-0.5, -1]) # 1
print PERCEPTRON_TEST([-1, 1], [-0.5, -1]) # 0Give a similar definition for OR operation from the illustration above.





Example - classify AND operation
- 2 inputs x0 and x1
- 2 weights W0 and W1
- 1 output a
x0 x1 W0 W1 in a 0 0 0.5 0.5 0.0 0 0 1 0.5 0.5 0.5 1 1 0 0.5 0.5 0.5 1 1 1 0.5 0.5 1.0 1 Error when output overshoots correct answer.
Can adjust:
- weights
- bias
- threshold function.
Adding a bias of -0.75 gives:
x0 x1 W0 W1 Wbias in a 0 0 0.5 0.5 -0.75 -0.75 0 0 1 0.5 0.5 -0.75 -0.25 0 1 0 0.5 0.5 -0.75 -0.25 0 1 1 0.5 0.5 -0.75 0.25 1

Learning
- Training set consists of example inputs and correct output (x and y).
- Error is the difference between the calculated output for a given input and the correct output.
- Learning occurs by correcting weights when an error occurs.
- W[j] - weight
- alpha - learning rate
- Err - error between calculated, g(summed weighted inputs), and correct output, y
- x[j] - input
- y - correct output
- g - calculated output
W[j] = W[j] + (alpha * Err * x[j])
- If the training set can be represented, the weights will be adjusted to reduce the error.
- Weights contain results of learning, attempts to converge weights to an approximate solution.
- Initial weight specifies an initial point on the solution surface, changing weights moves to a different point on the surface. A two-input/one output perceptron represents a single two-dimensional solution surface.
Example
- Training examples for OR function
- OR training set
x1 x0 y 0 0 0 0 1 1 1 0 1 1 1 1 Initial weights
W0 W1 -0.2 0.4
- Threshold function defines the activation results for a given weighted input, for the example at right, -0.2 results in a 0 output of the perceptron:
def g(In) :
if In > 0.0 : return 1
else : return 0
- Network at right for inputs [1,0] computes output of -0.2 for which the threshold function computes a 0; an error of 1.
- After 1 training epoch, the weights are modified to:
W = [0.0, 0.4]
- Network at right for inputs [1,0] computes output of 0.0 for which the threshold function computes a 0; an error of 1.
- After 2 training epoch, the weights are modified to:
W = [0.2, 0.4]
- Network at right for inputs [1,0] computes output of 0.2 for which the threshold function computes a 1; an error of 0.
- At this point the network is trained to correctly classify OR function behavior.
# Figure 20.21 from AIMA by Russell and Norvig
# using threshold activation function
def PERCEPTRON_LEARNING(examples, network) :
alpha = 0.2
W = network
for epoch in range(3) :
for e in examples :
(x, y) = e
sum = 0.0
for j in range(len(x)) :
sum = sum + W[j]*x[j] # Compute activation output
Err = y - g( sum )
for j in range(len(x)) : # Learn by updating weights
W[j] = W[j] + (alpha * Err * x[j]) # based on error
return W
def PERCEPTRON_TEST(examples, network) :
W = network
result = []
for e in examples :
(x, _) = e
In = 0.0
for j in range(len(x)) :
In = In + W[j]*x[j]
result.append(g(In))
return result
def g(In) : # Threshold activation function
if In > 0.0 : return 1 # Ex. In > 0.0 for OR
else : return 0 # In > .75 for AND
OR = [ ([0,0], 0),
([0,1], 1),
([1,0], 1),
([1,1], 1)]
NN = [-0.2, 0.4]
print PERCEPTRON_LEARNING(OR, NN) # [0.200000000001, 0.40000000002]
print OR # [([0, 0], 0), ([0, 1], 1), ([1, 0], 1), ([1, 1], 1)]
print PERCEPTRON_TEST(OR, NN) # [0, 1, 1, 1]
- Give the definition of the AND training set.
- Try different values of the threshold cutoff in function g. How does that affect learning?
Example - Learning OR Boolean operation
- Example perceptrons learning the OR operation in 3 epochs (exposures to training set)
- OR training set
x1 x0 y 0 0 0 0 1 1 1 0 1 1 1 1 - Initial weights are random. Different starting weights will begin learning at a different point on the solution surface and possibly lead to learning a different final weight solution.
W = [-0.2, 0.4]
- Network activation
sum=sum+W[j]*x[j]
- Threshold activation function
def g(In) :
if In > 0.0 : return 1
else : return 0- Learning correction for threshold function
W[j] = W[j] + alpha * Err * x[j]
In first epoch, weights are changed:
What are the new weights? - Second epoch
What is the next activation?
Sample learning OR operation epoch x0 x1 Expected
YActual
YError W0 W1 0 0 0 0 0 0 -0.2 0.4 0 0 1 1 1 0 -0.2 0.4 0 1 0 1 0 1 0.0 0.4 0 1 1 1 1 0 0.0 0.4 1 0 0 0 0 0 0.0 0.4 1 0 1 1 1 0 0.0 0.4 1 1 0 1 0 1 0.2 0.4 1 1 1 1 1 0 0.2 0.4 2 0 0 0 0 0 0.2 0.4 2 0 1 1 1 0 0.2 0.4 2 1 0 1 1 0 0.2 0.4 2 1 1 1 1 0 0.2 0.4
Learning

W[j] = W[j] + alpha*Err*x[j]*gp(in)
Problem
X = w1I1+w2I2
Y = 1 for X > t, 0 for X <= t

Problem
- Hidden units produce output that is summed giving network output.
- Weights between hidden and output units need adjusting for error.
- No training output available for comparison with hidden unit output to determine error.
- Need means to assign error of each output unit proportionally to each hidden unit.
Back Propagation
Training Algorithm
Inputs: ai0, ai1, ...
1 hidden layer:
Hidden nodes: ah0, ah1, ...
Output nodes: ao0, ...
Weights:
input to hidden nodes: Wih0,0, Wih0,1, Wih1,0, Wih1,1 , ...
hidden to output node: Who0,0, Who1,0 , ...
Activation function: g
Gradient function: gp
for j in range(len(ah)) : # activate hidden layer
ini = 0.0
for k in range(len(ai)) :
ini = ini+Wih[j][k]*ai[k]
ah[j] = g(ini) # hidden activation
for i in range(len(ao)) : # activate output layer
ini = 0.0
for j in range(len(ah)) :
ini = ini+Who[i][j]*ah[j]
ao[i] = g(ini) # output activation
deltao[i] = gp(ao[i])*(y[i]-ao[i]) # output error gradient
for k in range(len(ah)) :
error = 0.0
for j in range(len(y)) : # assign feedback
error = error + Who[j][k]*deltao[j]
deltah[k] = gp(ah[k]) * error # hidden error gradient
for j in range(len(ah)) :
for i in range(len(ao)) : # update output weights
Who[i][j] = Who[i][j] + (alpha * ah[j] * deltao[i])
for k in range(len(ai)) :
for j in range(len(ah)) : # update hidden weights
Wih[j][k] = Wih[j][k] + (alpha * ai[k] * deltah[j])Example
Learn to classify AND, XOR, etc. functions.
2 inputs: ai0, ai1
1 hidden layer:
2 hidden nodes: ah0, ah1
1 output node: ao0
Weights:
input to hidden nodes: Wih0,0, Wih0,1, Wih1,0, Wih1,1
hidden to output node: Who0,0, Who1,0
Activation function approximates threshold function but is everywhere continuous hence differentiable, the hyperbolic tangent function is at right, necessary for gradient descent:
def g(x):
return math.tanh(x)
Derivative of activation function.
def gp(y):
return 1.0-y*yBack propagation
Output error included in proportioning hidden error when adjusting hidden weight
# Figure 20.25 from AIMA by Russell and Norvig. Back-propagation Neural Net import math def g(x): # tanh faster than the standard 1/(1+e^-x) return math.tanh(x) def gp(y): # derivative of g return 1.0-y*y def BACK_PROP_LEARNING(examples, network) : alpha = 0.2 (Wih, Who) = network for epoch in range(4000) : for (x,y) in examples : ai=x[:] # inputs/outputs deltao = [0.0]*len(y) deltah = [0.0]*len(Who[0]) ah = [0.0]*len(Who[0]) ao = [0.0]*len(y) for j in range(len(ah)) : # activate hidden layer ini = 0.0 for k in range(len(ai)) : ini = ini+Wih[j][k]*ai[k] ah[j] = g(ini) # hidden activation for i in range(len(ao)) : # activate output layer ini = 0.0 for j in range(len(ah)) : ini = ini+Who[i][j]*ah[j] ao[i] = g(ini) # output activation deltao[i] = gp(ao[i])*(y[i]-ao[i]) # output error gradient for k in range(len(ah)) : error = 0.0 for j in range(len(y)) : # back propagate to hidden error = error + Who[j][k]*deltao[j] deltah[k] = gp(ah[k]) * error # hidden error gradient for j in range(len(ah)) : for i in range(len(ao)) : # update output weights Who[i][j] = Who[i][j] + (alpha * ah[j] * deltao[i]) for k in range(len(ai)) : for j in range(len(ah)) : # update hidden weights Wih[j][k] = Wih[j][k] + (alpha * ai[k] * deltah[j]) return network def BACK_PROP_TEST(examples, network) : result = [] (Wih, Who) = network for (x,y) in examples : eresult=[] ai=x[:] # inputs/outputs ah = [0.0]*len(Who[0]) for j in range(len(ah)) : # activate hidden layer ini = 0.0 for i in range(len(ai)) : ini = ini+Wih[j][i]*ai[i] ah[j] = g(ini) # hidden activation for k in range(len(y)) : # activate output layer ini = 0.0 for j in range(len(ah)) : ini = ini+Who[k][j]*ah[j] eresult.append(g(ini)) # output activation result.append(eresult) return result XOR = [ ([0,0], [0]), # Training examples ([0,1], [1]), ([1,0], [1]), ([1,1], [0])] NN = [[[0.1, -0.2], # input to hidden weights [-0.3, 0.4]], # 2 input and 2 hidden [[0.5, -0.6]] # hidden to output weights ] # 2 hidden and 1 output print 'Learning results: ', BACK_PROP_LEARNING(XOR, NN) print 'Training and test set: ', XOR print 'Test results: ', BACK_PROP_TEST(XOR, NN)
The illustration at right shows the neural net node values (circled) and initial assigned weights after one activation and no learning. The given input is [1,0] for the XOR operation, the calculated output is about 0.1 but 1.0 is correct.
NN = [[[0.1, -0.2], # input to hidden weights
[-0.3, 0.4]], # 2 input and 2 hidden
[[0.5], [-0.6]] # hidden to output weights
]Before training, inputs elicited outputs:
Calculated outputs for XOR input test
([0, 0], [0]) -> 0.0
([0, 1], [1]) -> -0.35715898221760423
([1, 0], [1]) -> 0.1666890968523167
([1, 1], [0]) -> -0.21376454980843815After 1000 training epochs:
NN = [[[0.713, -3.32], # input to hidden weights
[0.708, -3.27]], # 2 input and 2 hidden
[[-6.47, -5.89]] # hidden to output weights
]
After learning the XOR training set, the [1,0] input now elicits a 0.95 output that more closely approximates the XOR function behavior.
Calculated outputs for XOR input test
([0, 0], [0]) -> 0.0
([0, 1], [1]) -> 0.95856726189074382
([1, 0], [1]) -> 0.95708877361026945
([1, 1], [0]) -> 0.12976883523533156For the initial data:
- modify the starting weights
- change the alpha (learning rate) in the program to learn slower and faster
- add another hidden layer node
- Creating an algorithm to memorize perfect inputs and produce correct output is not difficult, known as a database.
- One favorable characteristic of NNs is the ability to classify noisy inputs, imperfect data containing errors.
- For example, after training the NN on correct XOR function input and output, approximate inputs are presented with the result output:
Noisy input, correct output and calculated outputs for XOR input test
([0, 0], [0]) -> 0.0
([0,.75], [1]) -> 0.99021728739962134
([.85,0], [1]) -> 0.98180321780465274
([.95,.95], [0]) -> 0.22831311312614366- The NN is able to classify some noisy inputs correctly.
- The output results were closer to the correct value in two of the cases where the input had error.
Example learning handwriting (if you write in 7-segment display)
Networks can be trained to recognize handwriting, the following trains a network to recognize digits 1, 2, 3 given the representative
segments. For example, the digit 2 has all segments except c and f ON. The input corresponding to digit 2 is [1a,1b,0c,1d,1e,0f,1g] and example output is [0,1,0] meaning node 2 should produce the greatest output when the network is given [1,1,0,1,1,0,1] as input.
After training the seven segment input for digits 1, 2 and 3 produces the output:
1 2 3
[[0.98017386132142537, -0.0041815462717713298, 0.0045408480990941661],
[-0.018064341191439331, 0.98439571626010991, 0.031736525224963216],
[0.0027542312128823383, -0.010894250161767977, 0.97973528278179911]
]illustrating the network correctly classified the inputs to the corresponding digits.
Below is the problem definition having:
- 7 inputs - one for each segment
- 3 outputs - one for each digit
- 1 hidden layer - that's all this program handles
- 4 hidden nodes
The network at right illustrates input for the digit 1.
Note that only one hidden node is shown with connections.
seven_segment = [ ([0,1,1,0,0,0,0], [1,0,0]), # 1 digit example ([1,1,0,1,1,0,1], [0,1,0]), # 2 ([1,1,1,1,0,0,1], [0,0,1]) # 3 ] NN = [[[0.1, -0.2, 0.1, -0.2, 0.1, -0.2, 0.3], # input to hidden weights [-0.3, 0.1, -0.2, 0.1, -0.2, 0.4, -0.1], [-0.3, 0.1, -0.2, 0.1, -0.2, 0.4, -0.1], [-0.3, 0.1, -0.2, 0.1, -0.2, 0.4, -0.1] ], # hidden to output weights [[0.5, -0.2, -0.1, 0.3], [-0.6, 0.5, -0.2, 0.3], [0.1, -0.2, 0.1, -0.4]] ] print BACK_PROP_LEARNING(seven_segment, NN) print seven_segment print BACK_PROP_TEST(seven_segment, NN)