next up previous contents
Next: Strong attribute correlation Up: How it is different Previous: An indeterminate number of

Many features, not enough data

If we do choose to use the raw data, even in truncated form, then we still have many features to deal with. For example, assume that there is a domain with 10 classes, 10 channels and an average stream length of 50 frames. The total number of features we would get by ``flattening'' would be approximately 500 features (10 channels times 50 frames).

Quinlan [personal discussion] points out a rule of thumb which says that as a bare minimum for most learning tasks, there should be at least as many training instances per class as there are features. Thus we would need at least 5000 training streams for the simple example above. This is a simple empirical heuristic; but it highlights the problems that a large number of features introduces.



Mohammed Waleed Kadous
Tue Oct 6 13:04:40 EST 1998