There has been a lot of work on analysing ECGs within the artificial intelligence and machine learning communities. Perhaps the most significant of these was Bratko et al's work on the KARDIO [IB89] methodology and model for the heart. However it is interesting to note that in discussing its application in practice, he had this to say:
In respect to clinical application of KARDIO, the cardiologists felt that a significant limitation is that KARDIO accepts as input symbolic ECG descriptions rather than the actual ECG signal. Thus the user is at present required to translate the patient's ECG waveform into the corresponding symbolic descriptions. ... In presently available ECG analysers, these difficulties [in extracting symbolic descriptions] lead to unreliable recognition of some of the ECG features.
TClass provides a mechanism for overcoming this problem: it takes as input the ECG with some basic filtering applied. Further, KARDIO was developed to diagnose Type B problems; whereas in this chapter we tackle the harder Type A classification problem.
Our dataset was a random selection of 500 recordings from the full CSE
diagnostic database [WALA$^+$90]. This dataset was also explored
by de Chazal [dC98]. The records come from men and women
with an average age
years. The sample rate was 500Hz.
There were seven possible classes: normal (NOR), left ventricular
hypertrophy (LVH), right ventricular hypertrophy (RVH), biventricular
hypertrophy (BVH), acute myocardial infarction (AMI), inferior
myocardial infarction and (IMI) and combined myocardial infarction
(MIX). The classes are not evenly distributed, with the class
distribution shown in Table 6.16. The class
labels were determined independently through medical means
(e.g. surgery after the ECGs were recorded), so the class labels can
be assumed to be free of noise.
Each recording consists of 15 channels. These include the three Frank leads that provide a 3D representation of the heart: X, Y and Z, but also include the raw sensors V1 through to V6 as well as aVF, aVR and aVL. Figure 6.33 shows the data as it reaches the TClass learning algorithm, after some preprocessing steps, which we discuss below.
The focus of de Chazal's thesis was the manual construction of a set of features for classification of Frank Lead electrocardiogram. As part of this project, he also created software for filtering the ECG signals to get rid of noise, and also software for automatically segmenting an ECG recording into beats.
The filter design can be found on page 48 of his thesis. It is mainly designed to exclude ``baseline wander'' due to the patient's respiration at about 0.5Hz, and mains electricity at 50Hz. The effects of the filter are shown in Figure 6.34 on the three orthogonal leads (X, Y and Z) over a sequence of heartbeats. For ease of comparison, we used the same filter as de Chazal.
We also used the de Chazal's segmentation of the ECG recordings into individual beats. A typical ECG recording consists of several beats. These heartbeats must be segmented into individual heartbeats so that learning can take place. For our data, the ECGs are segmented according to the Q wave onset time.