Symbolic learning algorithms can be described as algorithms that determine a set of rules that form a relationship between the attributes and the classes. There are a variety of tools that work in this way, and they vary in the way that they construct these rules. For example, it might try to build rules in the form of nested if statements.
An example of such a tool is C4.5. C4.5 uses the concept of ``information gain'' to make a ``tree'' of decisions that lead it to the final outcome. The information gain can be described as the effective decrease in entropy (usually measured in terms of ``bits'') resulting from making a choice as to which attribute to use and at what level. For example, if I choose a specified attribute to discriminate at a given point, it will have some impact on how well the system can tell the classes apart. By considering which of the attributes is the best to use as a discriminant at a particular node in the tree, we can build up a tree of decisions that allows us to navigate from the root of the tree to a leaf node by continually examining attributes.
An example of such a tree is shown below
:
Thumb Bend <= 1 : Is Not a ``B'' (34.0/1.0) Thumb Bend > 1 : | Ring Finger Bend <= 2 : Is a ``B'' (7.0) | Ring Finger Bend > 2 : Is Not a ``B'' (3.0)
The graphical representation of the tree is shown below:
Figure 2.10: A graphical representation of the decision tree
produced by C4.5
As can be seen, by looking at attributes of the data about which we are trying to draw a conclusion, we can reach a leaf-node which will tell us what the output should be. This system is attuned to discrete outputs.
Essentially, however, and similarly to neural nets, symbolic learning algorithms do partition the space. This is shown in Figure 2.11. In general, there can be more than two partitions, and there can be many more attributes. In this situation in fact, there are eight attributes, but only two are interesting. As should be obvious, it is not usually sufficient to determine what a sign is, just by looking at the ring finger and thumb, it just happens to be in this case (for reasons of simplicity).
Figure 2.11: The decision tree discussed so far
partition the space as shown.
Note, however, that some, such as C4.5, can only consider one attribute at a time -- and can only segment the space into ``orthogonal'' pieces -- that is it can only divide the space parallel to the axes. Imagine, for example, that a concept ``boundary'' (i.e. the values along which the real instances of the underlying data change from one classification to another - i.e. on one side of the boundary they are positive cases, and on the other they are negative) that was not horizontally or vertically aligned, but diagonal. Such a concept could only be approximated by a decision tree, as shown in figure 2.12.
Other learning algorithms can segment the space more arbitrarily. Neural nets and instance-based learning algorithms can do this ``attribute space'' segmentation in a much more flexible way, allowing them to cope more easily with non-orthogonal concepts. Of course, in theory, if the attributes were carefully selected for C4.5 the concept boundaries would be orthogonal to the attributes, but this violates one of the objectives of learning algorithms, since if we knew the concept boundary, we wouldn't be using C4.5 anyway.
Figure 2.12: An example where the limitations of C4.5
come through. If the concept boundary is non-orthogonal, C4.5 has to
make some approximations.
There are other similar systems that can provide variations, and not all symbolic learners have the limitations that C4.5 does. Some, for instance, can apply a function of the attributes at each node, rather than always having an absolute value. Another is to have non-binary trees, with it being possible to partition according to ranges of values of attributes. Another variation is that rather than having a discrete output, it is possible that different functions be applied in different partitioned areas of the space.