Smoothing and filtering are two approaches to the same problem, but from different areas. Filtering comes from signal processing, while smoothing comes from statistics.
Filters are functions of input signal. For example, let a single
channel of data be represented as
, and the output data
.
Then any linear finite impulse response (FIR
)
filter can be thought of as a function:
The
are termed the weights of the filter. The art
and science of filter design is in the appropriate setting of these
weights. For example, a simple 5th-order moving average filter might
be:
This averages five points to create the output. There are also non-linear filters. Non-linear FIR filters can not be expressed as a linear combination of the input, but as some other (non-linear) function on the inputs. A simple example of a useful non-linear filter is a 5th order median filter. This is the filter represented by:
This type of filter is extremely useful for data with non-Gaussian noise, removing outliers very efficiently. A significant amount of research effort has gone into the development of appropriate filters for various purposes.
Statistics has taken a different tack to the problem: early approaches were similar to moving average filters. However, rather than using a simple moving average, the early work realised that linear regression could be used around the point we were trying to estimate; in other words, rather than simply averaging the five values around a point, a linear fit of the points, using a least squares estimate, could be used to give a better-looking result. Further, they realised that (a) if linear regression could be applied, so could other shapes, in particular splines (b) the weights for the instances used in regression could be changed. This led to further work by Cleveland [WCS92]. Friedman's super smoother [Fri84] extends Cleveland's work automatically setting parameter values based on minimising cross-validated error.
These have also been included in software. In this thesis we use the implementation provided by the statistics package R [Hor01] to do smoothing. In future chapters we will show the impact that using smoothing has.
Each of filtering and smoothing has their advantages. Filter design allows the use of domain knowledge to overcome domain-specific problems, while smoothing is flexible enough to be used more independently of the domain.
The main reason that smoothing is useful is that it allows the metafeature extraction functions to be simpler. Rather than a lot of effort being devoted towards making the metafeature extraction functions robust to noise, and simplify their implementation.