next up previous contents
Next: Graph-based induction Up: Alternative approaches Previous: Approximate string-matching approaches   Contents


Inductive Logic Programming

After the development of metafeatures, we considered using inductive logic programming as a means of learning definitions. Inductive logic programming is a type of machine learning where the background and domain knowledge are expressed in terms of logical predicates. In particular, each instance was presented as a collection of logical predicates.

To do so, the following predicates were introduced. instance(X) indicated that X was a training instance. class(X) was used to denote the possible classes of each instance. Finally a predicate called classification(X, Y) was also defined, linking each instance with its class.

Data was provided in the form of predicates based on metafeatures, but using only the extraction functions, rather than the parameter space. For example, there was an Increasing predicate that took the form Increasing(InstanceId, Channel, Gradient, Average, Duration, EventId).

We also gave an extensional definition of the temporal relations between events. Our hope was that these temporal relations could be used by the ILP system. To begin with, we defined two relations, with plans of implementing full Allen's interval logic [All83]. The two relations were during(EventId1, EventId2) and after(EventID1, EventID2).

Unfortunately, this approach did not work well for a number of reasons. FOIL [Qui90], GOLEM [MF92] and PROGOL [Mug95] were all tested on simplified datasets with unsuccessful results. Each did not succeed for different reasons. The first was the size of the dataset. Even with a small number of events and training instances, the number of generated predicates was huge - especially that extensional definitions of temporal relations had to be generated for FOIL and GOLEM. Secondly, these three ILP systems do not handle noise well at all, when the detection of sub-events of time series is inherently noisy. Thirdly, FOIL and GOLEM do not handle numerical attributes at all, and PROGOL does not do it particularly well, requiring a discretisation stage. Fourthly, the above representation is not deterministic; for example, consider the after predicate. Consider a simple sequence of events that occur in the same instance in the order 1,2,3,4,5,6,7. Then after(2,1) is true, but so is after(3,1) and so on. This means, for example, that if a clause is of the form after(X,1) then X can take many values. This has dramatic consequences. For a specific-to-general learner like GOLEM, this means that bottom clauses are going to be gigantic. For general-to-specific learners like FOIL, this means that the search space of possible clauses is going to be very large. Fifthly, the previous effect is worsened by the need for clauses composed of many predicates. Even a relatively simple temporal concept such as ``channel A goes up while channel B goes down'' would require a total of 6 predicates in our representation. Most ILP systems do not handle long clauses very well.

However, some hope for future work has been given by the work of Rodriguez et al [RAB00]. Although the work is preliminary and makes use of highly simplified versions of the same datasets we employ (for example, he deals with a variant of the Auslan dataset provided by us but using only ten classes, and all instances are downsampled to a fixed length of 20 samples); one of the representations he uses that turns out to function well is a ``true-percentage'' predicate. A true-percentage predicate takes the form true-percentage(Instance, Channel, TimeInterval, Min, Max, Percentage). In other words, if a channel's value is between Min and Max at least Percentage of the time in the TimeInterval on the channel Channel, the predicate is true.

This predicate has the advantage of being robust to noise, since the assertions don't have to be absolutely true - they can be true 80 per cent of the time, for instance. Secondly, they are self-contained, removing the need for non-deterministic expressions of predicates. However, they require a very careful and customised definition of the search function. If one contemplates the above for a moment, it is clear if one naively extracted all the true-percentage predicates from the training instances, then very large numbers of predicates would be generated. The net result is that Rodriguez et al have to carefully constrain the search algorithm of the ILP in order to induce reasonable clauses. Their results show that the true-percentage measure is of some benefit combined with other predicates (e.g. Euclidean distance measures, etc). It is not clear whether such techniques generalise to larger, real-world domains, but it does point to the possibility of temporal classification using ILP.


next up previous contents
Next: Graph-based induction Up: Alternative approaches Previous: Approximate string-matching approaches   Contents
Mohammed Waleed Kadous 2002-12-10