We would like our classifier to work for as many different domains as possible, without large amounts of arbitrary modification to parameters. We already have some existing techniques which can perform well when the costs of manual adjustments bring enough benefits. The best example of this is hidden Markov models, the dominant means of speech recognition. Hidden Markov models (see section 2.1 and [RJ86]) can recognise speech with high rates of accuracy, but require a great deal of tweaking, not all of which is easily extractable from domain knowledge.
At the same time, we want to provide a mechanism for domain knowledge to be included easily, so that (i) the classification task is made easier (ii) concepts are explained in terms related to the domain itself.