An instance of the above architecture, called TClass, has been implemented. This implementation is designed in an object-oriented manner to allow different PEPs, global feature calculators, clustering algorithms and learners to be substituted.
Currently, the following global feature calculators are available: mean, median, mode, maximum value and minimum value for a channel. These can be applied to any channel.
Similarly, the PEPs implemented are:
All of these have some simple heuristics for dealing with noise. For instance the local maximum applies a smoothing filter first to reduce noise before locating maxima.
Currently, TClass uses k-means clustering, with all values normalised
by standard deviation. The confidence metric used for cluster membership
is distance to the cluster's centroid. The learner can either be a
naïve Bayes learner
or C4.5 [Quinlan, 1993] with the default
parameters.
Figure: Approach used in implementing to simplify
clustering task. Classifier 1 through to Classifier N are voted to
make a final classification.
Empirically, it was found that the k-means clustering algorithm did
not perform well when all of the events from different classes were
clustered simultaneously. Thus a slight modification was made to the
architecture; the basic architecture was replicated from the event
clustering stage onwards on a per-class basis, as shown in figure
7. Event clustering is now performed only on
events coming from instances of the same class. These clusters are
then used to extract the synthetic events as before. All training
instances are then attributed
with these synthetic events, creating a set of synthetic
event attributes. These class-specific synthetic event attributes are
combined with the class-independent global features, creating a set of
features that are used by the learner to induce a classifier. Because
one such classifier is constructed per class, if there are n
classes, there are n classifiers induced.
To classify an unseen instance, event extraction and global feature calculation are applied. For each class, the instance is then attributed with that class' synthetic events, thus generating the synthetic event attributes required by the classifier associated with that class. The instance is then classified by each classifier. In this way, n classifications will be made. The most common classification is then taken as the final prediction.