## What is cross-entropy, and why use it?

The cross-entropy measure has been used as an alternative to squared error.
Cross-entropy can be used as an error measure when a network's outputs can
be thought of as representing independent hypotheses (e.g. each node stands
for a different concept), and the node activations can be understood as
representing the probability (or confidence) that each hypothesis might
be true. In that case, the output vector represents a probability distribution,
and our error measure - cross-entropy - indicates the distance between what
the network believes this distribution should be, and what the teacher says
it should be.
There is a practical reason to use cross-entropy as well. It may be more useful
in problems in which the targets are 0 and 1 (thought the outputs obviously may
assume values in between.) Cross-entropy tends to allow errors to change weights
even when nodes saturate (which means that their derivatives are asymptotically
close to 0.)

This is an excerpt from p. 166 of Plunkett and Elman: Exercises in Rethinking Innateness, MIT Press, 1997.

Wikipedia article on cross entropy