Since the number of frames is not fixed; we do not know a priori how many features there are. For example, we might have one instance last for 43 frames and another instance of the same class last for 57 frames. How do we match these? One way is to ``pad out'' to the maximum number of frames; or to truncate instances to some maximum limit. Neither of these solutions is really very elegant. We may be losing important data that is critical to classification.
One solution is to use something other than an attribute-value learner; for example a relational learner, where the observation language allows for descriptions not limited to a fixed set of attributes. This may work if the learner is coping with large amounts of data (see next section). Most existing relational learning techniques do not meet this criterion.