This is unique to strong TC. Clearly, not every possible sequence of classes can be learned. Consider the speech recognition task, this would be like trying to learn how to recognise every possible sentence. Any training set is unlikely to contain even a small percentage of the possible sentences; how would it be possible to make a system that classifies all possible sentences?
The most obvious solution is to break the training sentences into
individual words, learn to recognise them, and then
somehow combine these individual word recognisers into something that
can recognise whole sentences. However, this introduces other
problems. If all that is given is a training stream and the
corresponding class sequence, when does one class in the
class sequence end and the next begin?
How are ``transition periods'' from one class to the next handled?
How does one cope with the problem that classes in the sequence are
not independent
? If the stream is segmented incorrectly then any
learner will then get the wrong data: one class will get a part of a
stream that belongs to the next class, and the other will get less
frames then it should get.