A metafeature is an abstraction of some substructure that is observed within the data. In the temporal domain, the substructures we are interested in are sub-events, like the LoudRuns of the Tech Support domain. Formally, a metafeature has two parts:
For every instance in the training set
that we apply a
metafeature to, a set of points in the parameter space are returned.
Each point returned represents an ``occurrence'' or ``instance'' of
the metafeatures, which we will term an instantiated feature.
In other words, an instantiated feature is a tuple
, where
are values for each of
the parameters.
The parameter space represents all the possible instantiated features we could possibly generate or observe in the data. Hence all instantiated features lie in the parameter space.
A simple example is the LoudRun metafeature. As previously
outlined, the LoudRun metafeature has two parameters for each
instance: the starting time
and the duration
. Hence, our
parameter space is two dimensional. In fact, it is easy to show that
our parameter space is
, although sometimes it will be
convenient to treat it as
.
In this case, it is easy to define the LoudRun extraction
function as:
In other words, the extraction function
, when applied to an
training stream finds all the subsections of the channel when there is
an extended period of high-volume conversation, flanked by low-volume
conversation on either side
.
This matches our intuition of a loud run.