When presented with data, it is frequently presented at a high data rate. For example, sign language information about the hand might be received at 200 frames per second. Hence an average 2 second sign might be 400 frames long. Similarly, with ECG data, the electrical signals may be sample at 500 frames per second (sometimes also described as 500 Hertz); hence an individual heartbeat consists of 400 to 700 samples (corresponding to heart rates between 70 beats per minute and 50 beats per minute). This can be a huge amount of data to consider.
Therefore, the data could be resampled at a lower frequency -
commonly called downsampling
. Under certain circumstances, this may not adversely
affect the learning and classification performance and simultaneously
reducing feature extraction time. For example, say we were examining
the sign language data at 200 frames per second. We could take every
two consecutive samples from the original data for each channel,
average them, and for each two frames in the input, the output would
have only one frame. In this case we are applying a downsampling
factor of two: the data is reduced in size by a factor of 2. Hence,
a 400 frame sample would be reduced to a 200 frame sample - halving
the amount of data that must be dealt with. Similarly, a downsampling
factor of 4 would reduce the data to one quarter of its original
size
.
Of course, this works well for continuous attributes, but not so well for discrete attributes. For discrete attributes, other methods can be employed, such as voting.