Application of Fractal Algorithms to Identify Cardiovascular Diseases in ECG Signals

,


Introduction
The circulatory system is one of the such systems of the human being that fulfills a specific function; and, where the most important piece of this system is the heart, because it is responsible for pumping the blood to the different organs and systems of the human body in order to oxygenate them.
Due to the agitated life we lead, poor feeding, combined with the consumption of tobacco and alcohol steadily and in some cases certain hereditary ills, they can produce different types of ilness that damage the heart and affect the natural rhythm of its heartbeat. For this reason, there are several alterations about the forms of cardiac rhythms, which vary in intensity and severity. Thus, the purpose of using ECG signals of patients with cardiovascular diseases of higher incidence arises, such as it was stated by the World Health Organization [1], in order to be identified. For this was necessary to use the Physionet database [2], from which a total of 60 records of electrocardiogram signals were downloaded, including the one corresponding to the normal sinus rhythm. Then, for the identification of cardiovascular diseases in digitized ECG signals, we chose a field of mathematics that is dedicated to the study of signals that have a repetitive characteristic of selfsimilarity, which is called fractals. Likewise, this theory is also used in the analysis of encephalogram signals EEG, like early detection of Alzheimer's disease [3], to visual evoked potential [4], to analyze the EEG brain oscillations during the development of cognitive tasks [5], and to analyze of the bimodal pattern of neuronal activity [6]. Likewise, the principal components analysis theory was used to reduce the dimensionality of the results obtained from the fractal dimensions of Higuchi and Katz. For this, the work of [7] was taken as reference, who used fractal analysis and chaos theory to detect dynamic changes in a group of 13 ECG signals with healthy and unhealthy segments, mainly when RR intervals and the ST segments were analyzed. Likewise, [8] was taken as reference who show the use of a procedure about analysis of fluctuation without tendency based on the theory of fractals, for patients who had cerebrovascular accidents. Also, the work of [9] where they indicate the development of a local fractal characteristic based on a method of concordance for the classification of arrhythmia ECG, by matching it with the representing templates from other ECG signals with different types of arrhythmias; and for this they used the Euclidean distance. Also, [10] used fractal dimension for automated diagnosis of serious arrhythmias. In the same way, the work of [11] investiged three fractal dimension methods, among them, Higuchi and Katz, plus artificial neural networks, as a methodology to predict sudden cardiac death. And, in [12], the use of two methods for the extraction of characteristics in ECG signals is proposed, where the second method is a fusion of fractal characteristics by stacking.

Processing and Fractal Dimension
This section briefly describes the four databases used to subsequently apply the filtering step, followed by the calculation of the fractal dimension of the ECG signals. In addition, this database was seleccioned, because represent to a group of cardiovascular diseases that have great incidence in Peru.

Physionet Database
There were four databases used in this article, which were obtained from the Physionet website [2]. These are: • The BIDMC Congestive Heart Failure Database, for severe cases of heart failure. Which used a sampling frequency of 250 Hz, for a DII derivation, and with signals of 900,000 samples.
• The MIT-BIH Normal Sinus Rhythm Database, in the case of healthy people. Which used a sampling frequency of 128 Hz, for a DII derivation, and with signals of 460,800 samples.
• St.-Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database, for the case of hypertension disease. Which used a sampling frequency of 257 Hz and signals of 462,600 samples.
• The PTB Diagnostic ECG Database for the case of ischemic heart disease. Which used a sampling frequency of 1,000 Hz and signals of 115,200 samples.
In the same way, the PhysioBank ATM tool was used to download the files in *. MAT format, for a total of 15 signals per database in order to have a uniformity in all four cases. Figure 1 shows the temporal representation of an ECG signal in its original form, corresponding to a patient with heart failure.

Filtering Stage
Filtering stage consists of attenuating the baseline or DC component that alter the ECG signal. According to the American Heart Association in its article Recommendations for the Standardization and Interpretation of Electrocardiograms, recommends the use of filters with cutoff frequencies up to 0.67 Hz or less for linear digital filters with phase distortion equal to zero [13]. Therefore, it was used a Butterworth high pass filter with order 6 and cutoff frequency equal to 0.5 Hz, in order to attenuate the DC component and having a stable phase response. Figure 2 shows the comparison between an original signal from a patient with ischemic heart disease, and its corresponding result after applying the filtering stage.

Estimation of Fractal Dimensions
The calculation of the fractal dimension of Higuchi considers a finite set of observations of time series taken in a regular interval, to then built a new series of time [14]. It is described using Equation (1).
For m = 1, 2, … , k Where « m » indicates the initial time value, « k » is the discrete time delay, and || means integer part of .
In this way, algoritm proposed by Higuchi, to estime the fractal dimension, can be seen in Figure 3 as a flow chart. This flow chart is based on the program coded by [15]. Where « d » is the diameter of the curve and « L » is the length of the curve, defined as the distance between two successive points. In addition, Katz adds that by practical convention, in order to discretize the space and to normalize the fractal dimensions, he defines the standard unit as the smallest convolution size of interest in a form, then for waveforms, the average distance between 2 successive points is « a », then using « a » and clearing in the formula for fractal dimension D [16], result Equation (3).
Where « n » is the number of steps in the curve (L / a), and « a » is the average length of the unit. Then, flow chart used to calculate the fractal dimension by Katz can be seen in Figure 4. This flow chart is based on the program coded by [17].

Fractal characteristic of ECG signal
The calculation of the fractal dimension of Higuchi considers a finite set of observations of time series taken in a regular interval, to then built a new series of time [14]. It is described using Equation (1).
In the work of [18], it is affirmed that the methods based on fractal geometry have been used satisfactorily in the analysis of cardiac signals. These studies show that ECG signals were modeled as self-related fractal sets and their characteristics can be determined using the fractal dimension. In addition, the fractal nature of the ECG signal can be attributed to the self-similar pattern of cardiac rhythm activity, then the change in the rhythmic pattern can be measured using the fractal dimension since the selfsimilar structure will change.
Likewise, [19], in his article A Healthy Heart is a Fractal Heart, says that the heart is part of a great feedback system, whose dynamics are non-linear, non-stationary and multiscale. Consequently, the heartbeat is one of the most complex signs in nature. Thus, in the records of ECG signals obtained from sick and Signal is a serie?
Serie, Kmax elderly people, show great losses of complexity with respect to the beats obtained from healthy people. That is, the values of the fractal index tend to 1 when people are sick and on the contrary they move away from the unit when they are healthy.
Therefore, from a group of 4 ECG signals, the most distorted wave is the signal in Figure B and it is the one that belongs to a healthy person. Then, figure D belongs to the disease called atrial fibrillation, while Figures A and C belong to patients with severe heart failure. They are show in Figure 5. However, before proceeding to calculate the fractal dimensions of Katz and Higuchi, it is necessary to check their performance. For it, we used the Weierstrass function which was characterized by generating a synthetic signal to verify the operation of the proposed algorithms. Therefore, the procedure used by [20] was repeated, where synthetic signals are generated using the Weierstrass cosine function with known fractal dimensions (theorical), then these signals are processed with the proposed algorithms of Katz and Higuchi. Next, they are represented in tabular and graphic scheme. It is show in Table 1 and in Figure 6.
In the previous graphic representation, was observed a quasi linear behavior of the Higuchi algorithm with respect to the synthetic signal. By another hand, from the graphical representation corresponding to the Katz algorithm, a light variation was observed from the fractal dimension equal to 1.5, showing an increasing behavior as equal as the obtained by [20].
In this way, with the four previously filtered databases, we proceed to calculate the fractal dimensions of Katz and Higuchi in order to differentiate the signals belonging to healthy and sick people. However, such algorithms are of the unsupervised type, because they do not perform learning and they are only used to discriminate or differentiate the data.  For the estimation procedure of both fractal dimensions, each signal of the group of 15 was subdivided into 10 equal sections. For example, for the Physikalisch-Technische Bundesanstalt (PTB) database, belonging to ischemic heart disease, 115,200 samples are collected. However, only the first 100,000 samples were selected and subdivided into 10 equal parts to obtain frames or sections of 10,000 samples with an overlap of 1,000 samples.     As observed from Figures 7, 8, 9 and 10, there are 10 dimensions for each of the 15 signals in the database. Therefore, we proceeded to determine the arithmetic mean of the 10 FD, because are very close to each other. They are show in Tables 2  and 3.
However, with the arithmetic mean it is not possible to discriminate easily between one and another type of ECG signal, because such result present not scattered values for the same type of ECG signal. Only in the case of normal signal and ischemic heart disease do not have so scattered values. Otherwise, in the case of the Higuchi algorithm, the results of the arithmetic means present closer values, and they are no higher as scattered for each of the four signals, especially in the case of the normal sinus rhythm (healthy people). In this way, in order to have a better visualization and differentiation of the obtained results, we processed the FD by calculating the variance in order to see how these results differ from their average. Results of the variance of the averages obtained from the FD of Higuchi and Katz, for the four types of signals used, are shown. Also, signal with the greatest variation for the Higuchi's FD is arterial hypertension and the one with the least variation is the FD of the normal sinus rhythm. Similarly, for the Katz algorithm, the FD with the greatest variation is also arterial hypertension and the one with the least variation is that of ischemic heart disease. It is show in Table 4.
Then, in order to reduce the dimensionality, we decided to use the Principal Component Analysis on the obtained data. So, in this way, we achieved a better differentiation of the databases studied. For this, we created a matrix where the values of the FD are analyzed. When the results of the PCA are plotted, the values of the "score" matrix are used, which is generated when working with the PRINCOMP function of the Matlab. This matrix score contains the coordinates or projections of the main components. In a first stage, we used the PCA to compare the signals in pairs, so, the FD of the signals corresponding to the diseases had compared with the signal of the healthy person. In this way, we show some results in two dimensions that comes of the application of the PCA algorithm on the FD of Katz and Higuchi, respectively, for the case of frame equal to 10,000 and overlap equal to 1,000.
Thus, for Katz's Fractal Dimension, the comparison of heart failure with the normal signal showed a tendency between the two when trying to group, but several scattered points were also observed. It is show in Figure 11. Frame=10,000 and Overlap=1,000.  On the other hand, for Higuchi's Fractal Dimension, the comparison of heart failure with the normal signal showed a better separation between the 2 signals observed. It is show in Figure 12.
Performing more tests with the algorithms used, we proceeded to reduce the size of the frame to 1,000 and the overlap to 100. This gave rise to a calculation of the FD for 100 segments and therefore more features emerged were analyzed with the PCA algorithm. For this case, again, the Higuchi algorithm presented a better performance to analyze and differentiate the signals.
Similarly, for Higuchi's Fractal Dimension, the comparison of heart failure with the normal signal showed an excellent differentiation of the main components. It is show in Figure 13.
In this way, for Katz's Fractal Dimension, the comparison of heart failure with the normal signal showed an improvement in the differentiation of the main components. It is shown in Figure 14. Frame=1,000 and Overlap=100.
Therefore, with the reduction of the size of the frame and of the overlap a major separation of the main components is achieved.
At the end, we show, in 2D and 3D graphs, principal components (PCA) of the 4 signals studied for Higuchi Fractal Dimension for the case of frame equal to 1,000 and over equal to 100, where it is appreciated that heart failure is mixed with hypertension and ischemic heart disease. They are show in Figures  15 and 16.  On the other hand, as a result of the calculation of Katz´s Fractal Dimension for the case of frame equal to 1,000 and over equal to 100, the graphic representation in 2D and 3D of the main components of te 4 signals studied show a greater combination between them, making their separation and identification more difficult. They are show in Figures 17 and 18. Frame=1,000 and Overlap=100.

Conclusions
The Butterworth high pass filter was used for the elimination of the variations of the baseline, which allowed to filter the frequencies below 0.5 Hz, range in which they are present unwanted signals. In addition, the algorithms of Katz and Higuchi were adapted to the programming language of Matlab, for this purpose the corresponding equations were used, which were the basis of calculation for these algorithms. Frame=1,000 and Overlap=100.
PCA algorithm was used optimally, since it was possible to reduce the FD, obtaining the main components that represent the 4 types of signals (heart failure, hypertension, ischemic heart disease and normal signal).
The results were graphically represented with the principal components in two and three dimensions for the Katz and Higuchi algorithms. Finally, the algorithm of Higuchi gave a better performance and this is because Higuchi's algorithm calculates the FD through summations and repetitive averages of the segment or distance at which the FD was calculated, while Katz uses the basic form of the calculation proposed by Mandelbrot when define a fractal, which consists of a division of logarithms to calculate the FD. This performance results when refer to the observation of the variance of the FD averages, which are shown in Table 4.
In addition, the results improve when the size of the frame and the over are reduced to 1,000 and 100, respectively. This improvement is due to the fact that the components of the PCA do not appear dispersed, and on the contrary they remain grouped which facilitates their differentiation. In this way, despite using a frame size equal to 10,000 and an overlap equal to 1,000, the Higuchi algorithm achieved a better separation of the PCA components from the databases: normal signal y heart failure, as show in figures 11 and 12. For the other hand, by reducing the number of frames to 1,000 and the overlap to 100, again the Higuchi algorithm achieves a better grouping of the components, as show in figures 16 and 18.
On the other hand, the result of the work of [20] shows that Katz's algorithm is the most consistent method for discrimination of epileptic states from the intracranial EEG (IEEG), likely due to its exponential transformation of FD values and relative insensitivity to noise. For this reason, the content of this article differs in terms of the results achieved with respect to Higuchi's algorithm.
Finally, it can also be concluded that the use of fractal dimensions plus other techniques such as artificial neural networks, frequency analysis, principals components analysis, among others, offer greater applications of identification and classification on ECG and EEG signals allowing the early detection of Alzheimer's, identifying cardiovascular diseases, predicting sudden deaths due to heart problems, classifying heart signals [3], [7], [11], [12], etc.