In Chapter 4, we discussed metafeatures generally. However, we did not consider them from a computational point of view or discuss much in the way of implementation details. Some contemplation of the application of metafeatures leads to a natural division of their implementation into three subprocesses:
This sequence of steps is represented diagrammatically in Figure 5.1. In fact, this is exactly the steps followed in the Tech Support domain presented in Section 4.2. The training instances are shown in Table 4.1. The results of instantiated feature extraction are shown in Table 4.2. Several different possible outcomes of synthetic feature construction are shown in Table 4.5. Training set attribution leads to Table 4.10 (or, if relative membership is used, Table 4.11).
So far, only the training stage has been discussed. What are the changes that occur once we want to apply the system to unseen instances? The main distinguishing factor of unseen instances is that the class label is not known.
In the field of machine learning, a strict separation of training and test sets is usually enforced. In this particular case, this separation should extend to include not only the learning algorithm, but also the synthetic feature construction stage. If synthetic features are constructed using test data, then we have access to information that should not be available to the temporal learner.
However, using the output of the synthetic feature construction in the training stage as an input to the attribution stage is no problem. Figure 5.2 shows the typical procedural pipeline for using the system for temporal classification.
Note that the synthetic features created in the training stage are provided as an input in this case to the attributor. Note also that the attributor in the training and test cases need not be any different at all except in one regard: the output of the attributor in the training case includes class information that the propositional attribute-value learner system can use to build its classifier, whereas in the case of the attributor at test time, the output does not include the class information.
To further clarify the role of each of these components, Figures 5.3, 5.4 and 5.5 show pseudocode for instantiated feature extraction, synthetic feature construction and attribution.
The algorithm for instantiated feature construction takes a list of
training examples, a list of labels, and returns a list of tuples.
Each tuple consists of a list of instantiated features, and a class.
In Figure 5.3
appends
to the list
; also assume there is a method called
extract for each metafeature that takes a training instance
and, as defined in Chapter 4, returns a list of
instantiated features.
For the segmentation algorithm in Figure 5.4, the data must be converted into a format appropriate for segmentation. This means making a single list of tuples each consisting of an instantiated feature and a class label, rather than the original format, which is a list of 2-tuples, each consisting of a list of instantiated features and a class label. Also note that SegAlg could be any segmentation algorithm, for example either of the algorithms discussed in Chapter 4.
The attribution algorithm shown in Figure 5.5 takes a membership function, such as the ones we discussion in Section 4.10.1; such as relative membership, and finds if there is an instantiated feature from a particular training instance that is similar to each synthetic feature.