Figure 2: A linear discrimination between two classes

By function approximation, we describe a surface that separates the objects into different regions. The simplest function is that of a line and linear regression methods and perceptrons are used to find linear discriminant functions.

A perceptron is a simple pattern classifier. Given a binary input vector,
**x**, a weight vector, **w**, and a threshold value, *T*, if,

then the output is 1, indicating membership of a class, otherwise it is 0, indicating exclusion from the class. Clearly, describes a hyperplane and the goal of perceptron learning is to find a weight vector,

The perceptron is an example of a *linear threshold unit *(LTU). A single
LTU can only recognise one kind of pattern, provided that the input space is
linearly separable. If we wish to recognise more than one pattern, several
LTU's can be combined. In this case, instead of having a vector of weights, we
have an array. The output will now be a vector:

where each element of

LTU's can only produce linear discriminant functions and consequently, they are limited in the kinds of classes that can be learned. However, it was found that by cascading pattern associators, it is possible to approximate decision surfaces that are of a higher order than simple hyperplanes. In cascaded system, the outputs of one pattern associator are fed into the inputs of another, thus:

This is the scheme that is followed by multi-layer neural nets (Figure 3).

Figure 3. A multi-layer network.

To facilitate learning, a further modification must be made. Rather than using a simple threshold, as in the perceptron, multi-layer networks usually use a non-linear threshold such a sigmoid function, such as

Like perceptron learning, back-propagation attempts to reduce the errors between the output of the network and the desired result. However, assigning blame for errors to hidden units (ie. nodes in the intermediate layers), is not so straightforward. The error of the output units must be propagated back through the hidden units. The contribution that a hidden unit makes to an output unit is related strength of the weight on the link between the two units and the level of activation of the hidden unit when the output unit was given the wrong level of activation. This can be used to estimate the error value for a hidden unit in the penultimate layer, and that can, in turn, be used in make error estimates for earlier layers.

Despite the non-linear threshold, multi-layer networks can still be thought of as describing a complex collection of hyperplanes that approximate the required decision surface.

Connectionist learning algorithms are still computationally expensive. A critical factor in their speed is the encoding of the inputs to the network. This is also critical to genetic algorithms and we will illustrate that problem in the next section.

CRICOS Provider Code No. 00098G