This pedagogical domain is meant as an extremely simple example of
temporal classification. Consider the following scenario
: A
computer company, called SoftCorp, makes an extremely buggy
piece of software; hence they get many irate phone calls to their
technical support department. These phone calls are recorded for later
analysis. SoftCorp discovers that how these phone calls are
handled has a huge impact on future buying patterns of its customers,
so based on the recordings of tech support responses, they are hoping
to find the critical difference between happy and angry customers.
An intelligent engineer suggests that the volume level of the conversation is an indication of frustration level. So SoftCorp takes its recorded conversations and analyses the phone call. They divide each phone call into 30-second segments; and work out the average volume in each segment. If it is a high-volume conversation, it is marked as ``H'', while if it is at a reasonable volume level, it is labelled as ``L''. On some subset of their data (in fact, six customers), they determine whether the tech support calls resulted in happy or angry customers by some independent means. Note that conversations are not of fixed length; some conversations can be dealt with quickly, others take a bit longer.
Six examples of recorded phone conversations (with the process discussed above applied to them) are show in Table 2.1.
SoftCorp would like to employ some kind of machine learning tool to aid in finding rules to predict whether, at the end of a conversation, a customer is likely to be happy or angry, based on observing these volume levels.