We suspected that part of the cause of the lacklustre accuracy
performance of TClass was the issue of pruning with
such a large dataset. Hence we investigated by setting the minimum
number of instances (
) at each leaf. For example, we tried
.
For the PART learner, the error rate actually decreased slightly to
40.5 per cent, while the average number of rules was reduced from 94
rules to 24 rules. A set of 24 rules for recognising ECGs with a 40.5
per cent error rate, when a human expert has a 30 per cent error rate
is quite an accomplishment.
Still, it is hard to study the comprehensibility of such rules. Hence we used the binary classification rules approach as used on the Auslan datasets to see if there were any intelligent concepts that could be deduced. The results are shown in Figure 6.37.
To gain some insight as to whether this was a useful rule, it was
compared with the rules used by a commercial ECG classifier based on
an expert system [A97]
. The definitions produced shows the same
characteristics as the definitions produced in this manner for the
Auslan domains: the first few rules provide a ``first cut'' exclusion.
The third rule looks for a very low minimum on the X value. This is
because patients with RVH have a depression in the S wave
[A97], leading to a very large minimum x value. In de
Chazal's work [dC98] (page 179), he found that the minimum
x value is the most discriminant feature based on rank-correlation
analysis. The other criteria look for local maxima in the aVL and Z
channels that are unique to right ventricular hypertrophy cases: the T
wave is biphasic, i.e. rather than having a single maximum, it has two
maxima; while most normal heartbeats will have one maximum occurring
slightly later (around time 0.7) [A97].