A summary of how metafeatures are applied is:
To explain the application of metafeatures, we present a simple pedagogical domain. Suppose there is a mythical company called SoftCorp that develops and provides technical support for software. Tech Support calls are recorded for later analysis. SoftCorp wants to find the critical difference between happy and angry customers.
An engineer suggests that the volume level of the conversation is an indication of frustration level. Each call is therefore divided into 30-second segments; and the average volume in each segment is calculated. If it is high volume, it is marked as ``H'', while if it is at a reasonable volume, it is labelled as ``L''. On some subset of their data (in fact, six customers), they determine whether the tech support calls resulted in happy or angry customers by some independent means. These are shown in Table 1.
One expert advises that ``runs'' of high volume conversation -
continuous periods where the conversation runs at a high volume level
- are important for classification purposes. Runs of loud volume
could be represented as a tuple
consisting of:
This is our first metafeature, called LoudRun.
Each instance can now be characterised as having a set of
LoudRun events - the LoudRun events are the
substructures appropriate for this domain. These can be extracted
simply by looking for sequences of high-volume conversation. For
example,
, has one run of highs starting at time 3 lasting for 1
timestep, a high run starting at time 6 lasting for one timestep and a
high run starting at time 9 for 4 timesteps. Hence the set of
LoudRuns produced from the training instance
is
. These tuples are examples of
instantiated features.
These instantiated features can be plotted in the two-dimensional space shown in Figure 1. This is the parameter space. This two-dimensional space consists of one axis for the start time and another for the duration.
![]() |
Once the points are in parameter space, ``typical examples'' of LoudRuns can be selected. In this case, the points labelled A, B and C are selected, as shown in Figure 2. These are termed synthetic features. They may or may not be the same as an observed event - so for example, point A actually corresponds to a real event (the instantiated event (3,3) actually was observed in the data), whereas B and C do not.
![]() |
These synthetic events can be used to segment the parameter space into different regions by computing the Voronoi tiling: for each point in the parameter space, the nearest synthetic feature is found. The set of points associated with each synthetic event form a region and the boundaries of each region can be calculated. These are shown as dotted lines in Figure 2.
The next step is to make use of these regions. Questions like: ``does this training instance have an instantiated feature in A's region?'' can be asked. If the question is repeated for B and C, the result is Table 3. To construct this table, Table 2 is examined, and for each region if there is an instantiated feature that lies within it, a synthetic attribute corresponding to the point is marked as a ``yes''. This is now in a learner-friendly format. In fact, if it is fed it to C4.5, the simple tree in Figure 3 results.
| ||||||||||||||||||||||||||||||||||||||||
This tree says that if the training instance has an instantiated feature that lies within in region C (i.e. a run of high values that starts around time t=10 and goes for approximately 3.33 timesteps), then its class is Angry. In other words, as long as there is not a long high-volume run towards the end of the conversation, the customer is likely to be happy.