next up previous contents
Next: About this document ... Up: Early versions of TClass Previous: Line-based segmentation   Contents


Per-class clustering

Using line segments as the sole metafeature, the first ``real'' iteration of TClass was implemented, using the following algorithm:

Early results were disappointing. Further investigation revealed that the problem lay in the clustering stage. When all line segments were thrown into the parameter space, there were no clear clusters. To illustrate this principle, Figure A.2 shows the parameter space for local maxima of the y channel from the Flock sign domain. As can be seen, clustering in this space is not going to be easy. And in fact, K-means clustering would simply not converge even after 50 iterations.

Figure A.2: The parameter space of local maxima of the y channel in the Flock sign domain. All instantiated local maxima are shown.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{param-space.eps}\par\centering\centering\end{center}\end{figure}

Hence, we sought another solution: per class clustering. Rather than clustering all instances at once, only instances belonging to a single class would be clustered. Figure A.3 shows instantiated features for two classes. As can be seen, clustering either of these two classes would be relatively easy compared to clustering the instances in Figure A.2.

Figure A.3: The parameter space of local maxima of the y channel in the Flock sign domain, but shown for only two classes.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{param-space-class.eps}\par\centering\centering\end{center}\end{figure}

Hence, early versions of TClass, as published in [Kad98] and [Kad99] use per-class clustering: For each class, cluster instances from only that class. Use the clusters generated to re-attribute all training instances. Create classifiers using the learner with only the synthetic features generated from that class and the global features. In general, if there are $ c$ classes, $ c$ classifiers will be generated. These $ c$ classifiers vote to give a final classification. This could be considered a ``brute force'' manner of doing directed segmentation using and undirected clustering algorithm.

Obviously, this solution is not ideal. Firstly, there is running the learner $ c$ times, obviously not very fast. Secondly, if we look at Figure A.3, the lower two clusters would generated synthetic features in both learning tasks, creating, for all intents and purposes, two identical features. Thirdly, comprehensibility is not very good, since a total of $ c$ rulesets or decision trees must be examined.

Early papers used the term PEPS - short for parametrised event primitives, rather than metafeatures. It was not realised until after the publication of these documents that metafeatures might well have applications beyond temporal classification.


next up previous contents
Next: About this document ... Up: Early versions of TClass Previous: Line-based segmentation   Contents
Mohammed Waleed Kadous 2002-12-10