Neural Networks and Error Backpropagation Learning

Reference: Haykin, chapter 4

Aim:
To introduce some basic concepts of neural networks and then describe the backpropagation learning algorithm for feedforward neural networks.
Keywords: activation level, activation function, axon, backpropagation, backward pass in backpropagation, bias, biological neuron, cell body, clamping, connectionism, delta rule, dendrite, epoch, error backpropagation, error surface, excitatory connection, feedforward networks, firing, forward pass in backpropagation, generalization in backprop, generalized delta rule, gradient descent, hidden layer, hidden unit / node, inhibitory connection, input unit, layer in a neural network, learning rate, linear threshold unit, local minimum, logistic function, momentum in backprop, multilayer perceptron (MLP), neural network, neurode, neuron (artificial), node, output unit, over-fitting, perceptron, perceptron learning, recurrent network, sequence prediction tasks, sigmoidal nonlinearity, simple recurrent network, squashing function, stopping criterion in backprop, synapse, target output, threshold, training pattern, total net input, total sum-squared error, trainable weight, training pattern, unit, weight, weight space, XOR problem
Plan:
  • linear threshold units, perceptrons
  • outline of biological neural processing
  • artificial neurons and the sigmoid function
  • error backpropagation learning
    • delta rule
    • forward and backward passes
    • generalized delta rule
    • initialization
    • example: XOR with bp in tlearn
    • generalization and over-fitting
    • applications of backprop


Classification Tasks


Classification Tasks 2


History: Perceptrons

Perceptron diagram

Perceptron Learning (Outline)


Perceptron Learning (Outline) 2


Neural Models of Computation


Biological Neurons and Artificial Neurons


Multilayer Perceptrons


Overall Layout of MLP


Node Internals


The Error Back-Propagation Learning Algorithm


Weight Change Equation

  1. If node j is an output node, then δj is the product of φ'(vj) and the error signal ej, where φ(_) is the logistic function and vj is the total input to node j (i.e. Σi wjiyi), and ej is the error signal for node j (i.e. the difference between the desired output and the actual output);

  2. If node j is a hidden node, then δj is the product of φ'(vj) and the weighted sum of the δ's computed for the nodes in the next hidden or output layer that are connected to node j.
    [The actual formula is δj = φ'(vj) &Sigmak δkwkj where k ranges over those nodes for which wkj is non-zero (i.e. nodes k that actually have connections from node j. The δk values have already been computed as they are in the output layer (or a layer closer to the output layer than node j).]


Two Passes of Computation

FORWARD PASS: weights fixed, input signals propagated through network and outputs calculated. Outputs oj are compared with desired outputs dj; the error signal ej = dj - oj is computed.

BACKWARD PASS: starts with output layer and recursively computes the local gradient δj for each node. Then the weights are updated using the equation above for Δwji, and back to another forward pass.

Sigmoidal Nonlinearity

With the sigmoidal function φ(x) defined above, it is the case that φ'(vj) = yj(1 - yj), a fact that simplifies the computations.


Rate of Learning


Stopping Criterion

Two commonly used stopping criteria are:

Initialization


The Discovery of Backprop

  • The backprop algorithm was discovered by three groups at around about the same time.

  • First chronologically was Paul Werbos, who published a more general version in his PhD in 1974. He subsequently "spent many years struggling with folks who refused to listen or publish or tolerate the idea. Finally, in 1981 [he] published a more persuasive brief paper [on it]. Both that paper and the thesis are reprinted in entirety in P. Werbos, Roots of Backpropagation, Wiley 1994." [Source: email from Paul Werbos.] You can find material by Paul Werbos on his algorithm at http://www.werbos.com/AD2004.pdf
    He is now a program manager in the US National Science Foundation.

  • Rumelhart, Hinton and Williams published their version of the algorithm in the mid-1980s. Rumelhart and McClelland produced/edited a two-volume book that included the RHW chapter on backprop, and chapters on a wide range of other neural network models, in 1986. This book, known humorously as the "PDP Bible" was extremely influential. Rumelhart did further important work on neural networks before succumbing to a neurodegenerative disease. The Rumelhart Prize - the "Nobel Prize" of Cognitive Science, was established in his honour. Hinton is still going strong, at the University of Toronto. He won the first Rumelhart Prize in 2001. Williams is at Northeastern University, in Massachusetts.

  • Yann le Cun, a PhD student in Paris, independently discovered the algorithm (PhD completed in 1987). He subsequently joined Hinton's group for a while. He now works at Courant Institute of Mathematical Sciences at New York University.

Picture of Paul Werbos
Paul Werbos
Picture of Dave Rumelhart
Dave Rumelhart
Picture of Geoffrey Hinton
Geoff Hinton
Picture of Ronald Williams
Ronald Williams
Picture of Yann le Cun
Yann le Cun

The XOR Problem


Backprop Specification in tlearn


Backprop Specification in tlearn 2


Backprop Specification in tlearn 3

Error graph for a run of the xor model in tlearn

Backprop as a Black Art

The tricky things about backprop networks include designing the network architecture:

and then setting the adjustable parameters: It also turns out to be advisable to stop training early, for the sake of better generalization performance.


Generalization


Testing


Successful Applications of Backprop


Discussion


Summary: Error Backpropagation Learning
  • After briefly describing linear threshold units, neural network computation paradigm in general, and the use of the logistic function (or similar functions) to transform weighted sums of inputs to a neuron, we outlined the error backpropagation learning algorithm.

  • Backprop's performance on the XOR problem was demonstrated using the tlearn backprop simulator.

  • A number of refinements to backprop were looked at briefly, including momentum and a technique to obtain the best generalization ability.

  • Backprop nets learn slowly but compute quickly once they have learned.

  • They can be trained so as to generalize reasonably well.


Revision Topics
perceptrons
perceptron activation rule
perceptron learning rule
perceptrons can't learn XOR, backprop-trained nets can learn XOR
parts of a biological neuron (to the level of detail given in lectures)
artificial neuron model (weights, activation function, nonlinearity, logistic function)
layout of multilayer perceptron
delta rule
generalised delta rule (momentum)
forward and backward pass
stopping criteria and initialisation
generalisation, over-fitting, and how to avoid it
advantages and disadvantages of neural nets vs symbolic AI methods


Copyright © Bill Wilson, 2010, except where another source is acknowledged.
Bill Wilson's contact info

UNSW's CRICOS Provider No. is 00098G