Strong TC is a little harder to assess, because the types of error that can occur are more complex.
One possible initial definition would be the following. The
classification for an unseen stream is correct if
; i.e. it is correct if and only if the predicted and
actual class sequences are identical.
However, in many domains this is over-simplistic. If there are two
classes
and
, then the class sequence
is in some sense
``closer'' to
than
is, even though the above
classification system would consider them both equally wrong.
To solve this problem,
the notion of edit distance is introduced. In such systems, strings
are not either ``correct'' or ``incorrect'', but some strings are
closer to other strings. One such measure is the Levenshtein
distance [Lev66]. The Levenshtein distance between
strings
and
is the minimum number of differences between the
two strings. These differences can take three forms:
Of course, there is more than one way to get from one string to another. For example, a substitution can be thought of as a deletion and an insertion. However, the Levenshtein distance is defined as the minimum number of changes to go from one string to the other.
The Levenshtein distance measure can be used on class sequences. Each class maps to a symbol in the alphabet, and each class sequence maps to a string. Other distance measures may be appropriate for different domains.
Again, we should give greater weight to those elements of the stream
set which are likely to occur. The strong temporal classification task
can now be defined as minimising: