Single Layer Perceptron |

- Perceptron
- Representational Power of Perceptrons
- The Perceptron Training Rule
- Gradient descent and the delta rule
- Visualizing the hypothesis space
- Summary of Gradient descent rule
A perceptron takes a vector
of real-valued inputs, calculates a linear combination of these inputs,
then outputs a 1 if the result is greater than some threshold and -1 otherwise.
More precisely, given inputs x - w - weight which determines
the contribution of input x
_{i}to the perceptron output
Notice the quantity (-w Learning a perceptron involves
choosing values for the weights w Representational Power of Perceptrons A single perceptron can be
used to represent many boolean functions. For example, if we assume boolean
values of 1(true) and -1(false), then one way to use a two-input perceptron
to implement the AND function is to set the weights w In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. However, some boolean functions cannot be represented by a single perceptron, such as the XOR function.
The decision surface represented by a two-input perceptron. x1 and x2 are the perceptron inputs. - (a) A set of training examples and the decision surface of a perceptron that classifies them correctly
- (b) A set of training examples that is not linearly separable
How does a single perceptron learn the weight? The precise learning problem is to determine a weight vector that causes the perceptron to produce the correct +1, -1 output for each of the given training examples. One way to learn an acceptable weight vector is - to begin with random weights
- then iteratively apply the perceptron to each training example
- modifying the perceptron weights whenever it misclassifies an example.
- this process is repeated until the perceptron classifies all training examples correctly.
Weights are modified at each
step according to ther perceptron training rule, which revises the weight
w Gradient descent and the delta rule Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable. Gradient descent searches the hypothesis space of possible weight vectors, even in nonlinear training examples, to find the weights that best fit the training examples. Training error is the difference between target and output. Methmatically It's defined as follows. - D is the set of training examples
- t
_{d}is the target output for the training example d - o
_{d}is the output of the linear unit for training example d
Visualizing the hypothesis space the following graph is a graph of - w0, w1 plane - entire hypothesis space
- vertical axis - error E ralative to a set of training exmaples
Gradient descent algorithms is an algorithms that searches the steepest descent along the error space. It determines a weight vector that minimizes E by starting with an arbitrary initial weight vector, then repeatedly modifying it in small steps. Linear units has a single global minimum in this error surface. Gradient descent algorithms continue searching process until the global minimum error is reached. Summary of Gradient descent rule Each training example is a pair of the form <x, t>, where x is the vector of input values, and t is the target output value. By Gradient rule we get these results. - D - the set of training examples
- t
_{d}- target output for training example d - o
_{d}- the output of the linear unit for training exmaple d
From here, we can know that each unit weight w is redefined by the error value between target and output and also by learning rate. |