On a single element
, one way to measure success is to
say that if
then it is accurate, and
inaccurate otherwise.
This works for most cases. However, in some domains, the above is too
simplistic - not all inaccuracies are equally bad. Some errors may be
worse than others. For example, consider working on a medical TC
application involving a diagnosis, where
,
with ``yes'' indicating they have some condition and ``no'' indicating
they do not. A ``false positive'' classification (i.e. misclassifying
a negative as a positive) may not be as bad as a ``false negative''
(i.e. misclassifying a positive as a negative). This is an old problem
in machine learning, and a field termed cost-sensitive learning
has studied this problem.
Typically, it is solved by introducing a function
which tells us what the cost of misclassifying an
as a
. The
function need not be i-j symmetric, i.e.
. Universally,
.
We can represent the above simple case (where all errors are equally
bad) as:
However, we can always have a more complex cost function.
Secondly, it is better to get higher accuracy on frequently occurring
elements than on rare ones. So to give a more accurate measure of
accuracy, this too must be included. We use the function
to indicate the probability that a stream
has of
occurring in the stream set
.
Our goal can therefore be defined as minimising: