The second practical difficulty is extracting instantiated features
from noisy data. For example, if one is detecting a period of
increase, what if one suddenly gets a ``downwards spike'' caused by
non-Gaussian noise (e.g. in the case of the sign language recognition
using instrumented gloves, by bogus ultrasound reflections that
interfere with positional triangulation used to calculate the y values
shown)? One could argue that this is a preprocessing issue and not
really an issue for the learner
. Even so, preprocessing will never get rid of all of
the noise, although it may minimise it. Our extraction functions must
be designed to be robust to noise. It is more important to detect a
generally increasing signal, even if it means that there are short
periods within the signal where the property of monotonic increase is
not strictly true. For example, Figure 4.8 shows another
recording of the y channel of the sign building, but this one
with much greater noise than in Figure 4.6.
![]() |
This makes the implementation of such an extraction function
difficult; and in general the pursuit of the ultimate extraction
function could be as consuming as the whole process of learning to
classify in the first place. Certainly, it would be possible to define
a group of metafeatures IsInstanceOfClassA,
IsInstanceOfClassB, etc that operated on all channels
simultaneously
and whose parameters consist of a single boolean
value that told us if the stream belonged to a particular class. If we
could do so, it would be immensely useful, but if we knew how to
develop the extraction functions IsInstanceOfClassX, then we
would not need to learn in the first place - since all we need to
know is encapsulated in our metafeatures.
Hence like most things in engineering, the development of the extraction function is a tradeoff. If the extraction function is made too simple, then a useful representation will not reach the learning algorithm. If a great deal of time is spent developing the extraction function, then this undercuts the whole purpose of using a learner in the first place.
The surprising fact is, however, which we demonstrate in the remainder of the thesis, that even using a very primitive group of metafeatures, it is possible to obtain high levels of performance in temporal domains.
Consider implementing the Increasing metafeature and apply it to the y axis. Also assume that we use the representation (midtime, average, gradient, duration). Consider the period of increase from time 32 to time 45 in Figure 4.8. Then depending on how we implement the extraction function, it might be possible to ignore the small dips in the generally increasing pattern between that time; or it might be possible to break it up into three separate events: an increase from time 34 to 35, an increase from time 37 to 39 and an increase from time 40 to 43. One way to do this is to allow a certain amount of exceptions; another is to ``smooth'' the data to get rid of such bumps. In any case, Figures 4.9 and 4.10 show the instantiated features extracted by applying one implementation of the Increasing metafeature to Figures 4.6 and 4.8 respectively. Note that the events are in many ways similar, and that our filtering and noise handling techniques have worked to some extent.