|Single Layer Perceptron|
A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs a 1 if the result is greater than some threshold and -1 otherwise. More precisely, given inputs x1 through xn, the output o(x1, ..., xn) computed by the perceptron is
Notice the quantity (-w0) is a threshold that the weighted combination of inputs w1x1+...+wnxn must surpass in order for the perceptron to output a 1.
Learning a perceptron involves choosing values for the weights w0...wn. Therefore, the space of hypothesis in perceptron learning is the set of all possible real-valued weight vectors.
A single perceptron can be used to represent many boolean functions. For example, if we assume boolean values of 1(true) and -1(false), then one way to use a two-input perceptron to implement the AND function is to set the weights w0=-0.8, and w1=w2=0.5.
In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. However, some boolean functions cannot be represented by a single perceptron, such as the XOR function.
The decision surface represented by a two-input perceptron. x1 and x2 are the perceptron inputs.
How does a single perceptron learn the weight? The precise learning problem is to determine a weight vector that causes the perceptron to produce the correct +1, -1 output for each of the given training examples.
One way to learn an acceptable weight vector is
Weights are modified at each step according to ther perceptron training rule, which revises the weight wi associated with input xi.
Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable.
Gradient descent searches the hypothesis space of possible weight vectors, even in nonlinear training examples, to find the weights that best fit the training examples.
Training error is the difference between target and output. Methmatically It's defined as follows.
the following graph is a graph of
Gradient descent algorithms is an algorithms that searches the steepest descent along the error space. It determines a weight vector that minimizes E by starting with an arbitrary initial weight vector, then repeatedly modifying it in small steps.
Linear units has a single global minimum in this error surface. Gradient descent algorithms continue searching process until the global minimum error is reached.
Each training example is a pair of the form <x, t>, where x is the vector of input values, and t is the target output value. By Gradient rule we get these results.
From here, we can know that each unit weight w is redefined by the error value between target and output and also by learning rate.