next up previous contents
Next: Examples Up: Formulation of the problem Previous: Statement of the problem

Assessing success

In defining the above task, we have implied that in some way, the function tex2html_wrap_inline1805 should ``approximate'' the function tex2html_wrap_inline1859 . What exactly do we mean by this, and how do we measure how closely the predicted class functions match the actual class functions?

In general we can't, unless tex2html_wrap_inline1861 . However, we can at least define the a theoretical measure for the accuracy.

On a single element S, one way to measure success would be to say that if tex2html_wrap_inline1865 then it is accurate and inaccurate otherwise.

This works for most cases. In some domains, however, the above is too simplistic - not all inaccuracies are of equal badness. Some errors may be worse than others. For example, consider working on a medical temporal classification application involving a diagnosis, where tex2html_wrap_inline1867 , with ``yes'' indicating they have some condition and ``no'' indicating they do not. A ``false positive'' classification (i.e. misclassifying a negative as a positive) may not be as bad as a ``false negative'' (i.e. misclassifying a positive as a negative). In the sign language domain, misclassifying ``bad'' as ``unwell'' may not be as bad as misclassifying ``bad'' as ``good''.

To solve this problem, we introduce a function tex2html_wrap_inline1869 which tells us what the cost of misclassifying an i as a j. The function need not be i-j symmetric, i.e. tex2html_wrap_inline1875 . Typically, of course, cost(i,i) = 0.

We can represent the above simple case (where all errors are equally bad) as:

eqnarray168

Another complication is that sometimes it does not make sense to optimise for the whole space equally over the whole of tex2html_wrap_inline1763 . For example, it would be better to get higher accuracy on frequently occurring signs more than infrequent ones. So to give a more accurate measure of accuracy, this too must be included. We use the function tex2html_wrap_inline1879 to indicate the probability that a stream S has of occurring in the stream set tex2html_wrap_inline1763 .

Our goal can therefore be defined as finding:

displaymath1819

In other words, we are trying to find the function tex2html_wrap_inline1797 which minimises the sum of the cost of misclassification times the probability of occurrence over the whole of tex2html_wrap_inline1763 .


next up previous contents
Next: Examples Up: Formulation of the problem Previous: Statement of the problem

Mohammed Waleed Kadous
Tue Oct 6 13:04:40 EST 1998