In order to decide which components are to be used for a particular
problem domain, TClass uses an application file (usually
suffixed with a ``.tal'' - short for TClass Application List).
This describes what global features, metafeatures and parameter space
segmenters to use for a particular domain. An example component
description file for the Tech Support domain is shown in Figure
5.13
.
The first part of the file in Figure 5.13 describes the global feature extractors. The first is a mean volume level (L is treated as 0 and H is treated as 1 - this is effectively the same as calculating the percentage of time the conversation volume is high). The format for the entries in the global section is:
global <attribute name> <feature type> {
<parameter> <value>
...
<parameter> <value>
}
Figure 5.13 shows a typical example of a global declaration. It creates a global attribute called V-mean. It is uses a mean global feature extractor (other types of global feature extractors would include min and max). The mean global feature extractor accepts the parameter channel which tells it which channel to extract the mean for. The channel must be listed the in the domain description file. More information about globals can be found in Section 5.5.3.
The next section describes which metafeatures to use. Each metafeature application is described in a section like:
metafeature <metafeature name> <metafeature type> {
<parameter> <value>
...
<parameter> <value>
}
The metafeature's name is later used in the segmentation sections. The type governs what type of metafeature will be applied. For instance, in Figure 5.13, the rle (short for run-length encoding) metafeature is a straightforward generalisation of the LoudRun metafeature. In this case, we are looking for ``runs'' on the channel V that last for a minimum of length 1. Also we are only interested in runs of high-volume, not low-volume. Hence we limit our interest to runs of ``H''s.
The final section is for setting up parameter space segmentation. Typically, there will be equal number of metafeature applications and parameter space segmenters. The segmenter specifies which segmenters to apply to which metafeatures. The general format is:
segmenter <attribute prefix> <segmenter type> {
<parameter> <value>
...
<parameter> <value>
}
The attribute prefix is the name that will be used for attribute
values (although a number will be appended indicating which centroid
number it is). The segmenter type governs whether we are using k-means
(kmeans), expectation-maximisation (em) or directed segmentation
(directed). The segmenters take one required parameter: the
metafeature to apply the segmentation to. Hence we are building a
segmenter for the metafeature loudrun, which is a directed
segmenter (i.e., the random search algorithm described in Section
4.12). We specify that it should try 10000
random subsets for centroids; and we wish to use the
disparity measure.
This describes all of the inputs into the TClass system. We now discuss some of the available components.