Detecting abnormality in heart dynamics from multifractal analysis of   ECG signals

Snehal M. Shekatkar; Yamini Kotriwar; K.P. Harikrishnan; G. Ambika

arXiv:1705.00121·q-bio.TO·September 5, 2018

Detecting abnormality in heart dynamics from multifractal analysis of ECG signals

Snehal M. Shekatkar, Yamini Kotriwar, K.P. Harikrishnan, G. Ambika

PDF

TL;DR

This paper introduces a novel multifractal analysis method for ECG signals that effectively distinguishes healthy from unhealthy heart dynamics and employs machine learning for accurate abnormality detection.

Contribution

It presents a new approach using multifractal spectra of ECG signals to identify heart abnormalities, surpassing traditional nonlinear analysis methods.

Findings

01

Multifractal spectra can clearly separate healthy and unhealthy subjects.

02

The machine learning model predicts group labels with high accuracy.

03

ECG analysis reveals multifractal structure in heart dynamics.

Abstract

The characterization of heart dynamics with a view to distinguish abnormal from normal behavior is an interesting topic in clinical sciences. Here we present an analysis of the Electro-cardiogram (ECG) signals obtained under controlled conditions from several healthy and unhealthy subjects using the framework of multifractal analysis. Our analysis differs from the conventional nonlinear analysis in that the information contained in the amplitude variations of the signal is being extracted and quantified. The results thus obtained reveal that the attractor underlying the dynamics of the heart has multifractal structure and the resultant multifractal spectra can clearly separate healthy subjects from unhealthy ones. We use supervised machine learning approach to build a model that predicts the group label of a new subject with very high accuracy on the basis of the multifractal…

Figures13

Click any figure to enlarge with its caption.

Equations12

α = \frac{d}{d q} [(q - 1) D_{q}]

α = \frac{d}{d q} [(q - 1) D_{q}]

f (α) = q α - (q - 1) D_{q}

f (α) = q α - (q - 1) D_{q}

f (α) = A (α - α_{1})^{γ_{1}} (α_{2} - α)^{γ_{2}}

f (α) = A (α - α_{1})^{γ_{1}} (α_{2} - α)^{γ_{2}}

δ_{(n)} = \frac{1}{N} i = 1 \sum N (x_{o} (i) - x_{(n)} (i))^{2}

δ_{(n)} = \frac{1}{N} i = 1 \sum N (x_{o} (i) - x_{(n)} (i))^{2}

s (t_{k}) = \frac{c ( t _{k} ) - c _{min}}{c _{max} - c _{min}}

s (t_{k}) = \frac{c ( t _{k} ) - c _{min}}{c _{max} - c _{min}}

x_{i} = [s (t_{i}), s (t_{i} + τ), s (t_{i} + 2 τ), \dots, s (t_{i} + (M - 1) τ)]

x_{i} = [s (t_{i}), s (t_{i} + τ), s (t_{i} + 2 τ), \dots, s (t_{i} + (M - 1) τ)]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\templatetype

pnasresearcharticle

\leadauthorShekatkar \significancestatementWe address an important topic in clinical sciences about extracting quantifiers from measured Electro Cardio Gram(ECG)data to distinguish normal heart rhythms from the abnormal ones. The computed multifractal measures derived from discretized ECG, combined with a machine learning approach, provide fairly accurate indications of abnormality in cardiac rhythms. Further the subtle variations in the shape of beats captured from the analysis of beat replicated data have relevance in detecting variations in the normal heart dynamics. The analysis can enhance our understanding of possible transitions in the nonlinear dynamics underlying ECG data. \authorcontributionsG.A. and K.P.H. conceived the research. Y.K. downloaded and detrended the data. S.M.S. automated the codes, performed the analysis. G.A. and S.M.S. prepared the manuscript. All authors discussed and reviewed the manuscript. \authordeclarationAuthors declare no conflict of interest. \correspondingauthor1To whom correspondence should be addressed. E-mail: g.ambikaiiserpune.ac.in

Detecting abnormality in heart dynamics from multifractal analysis of ECG signals

Snehal M. Shekatkar

Indian Institute of Science Education and Research, Pune-411008, India

Yamini Kotriwar

Indian Institute of Science Education and Research, Pune-411008, India

K.P. Harikrishnan

Department of Physics, The Cochin College, Cochin-682002, India

G. Ambika

Indian Institute of Science Education and Research, Pune-411008, India

Abstract

The characterization of heart dynamics with a view to distinguish abnormal from normal behavior is an interesting topic in clinical sciences. Here we present an analysis of the Electro-cardiogram (ECG) signals obtained under controlled conditions from several healthy and unhealthy subjects using the framework of multifractal analysis. Our analysis differs from the conventional nonlinear analysis in that the information contained in the amplitude variations of the signal is being extracted and quantified. The results thus obtained reveal that the attractor underlying the dynamics of the heart has multifractal structure and the resultant multifractal spectra can clearly separate healthy subjects from unhealthy ones. We use supervised machine learning approach to build a model that predicts the group label of a new subject with very high accuracy on the basis of the multifractal parameters. By comparing the range of scaling indices in the multifractal spectra with that of beat replicated data from the same ECG, we show how each ECG can be checked for abnormality for variations within itself.

keywords:

Nonlinear time series analysis $|$ Multifractals $|$ ECG data $|$ Machine learning

doi:

www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX

\dates

This manuscript was compiled on

\verticaladjustment

-2pt

\dropcap

The complexity of many physiological rhythms originate from the underlying complex nonlinear dynamical processes (1, 2, 3, 4, 5, 6, 7). The various levels of this complexity and their variations, if properly discerned, will be useful in understanding the abnormalities that lead to many pathological cases(8, 9, 10). A mathematical representation of their complexity using dynamical equations is most often not realizable due to the many interacting variables and uncertain parameters involved (11, 12). Hence the only possibility to study the dynamics in such situations to is to rely on information that can be deciphered from signals obtained from these systems(13, 14, 15). Over the last few decades, several such physiological signals like EEG, ECG, fMRI etc have been subjected to different techniques available under the broad area of nonlinear time series analysis (16, 17, 13).

It is established that most of the cardiovascular diseases arise from the changes in the dynamics of the heart. A healthy heart is a complex system with fractal nature (18, 19, 20) but cardiac abnormalities or malfunctions can cause subtle changes or variations in its complexity. However, the complexity related quantifiers have not yet reached the clinics for effective diagnostics and therapy. For this, we have to develop a unique way of characterizing its complexity that will help to distinguish healthy and pathological cases. In the present work, we report how measures derived from the multi fractal spectrum of ECG signals can effectively be used as a promising tool in the diagnosis of abnormalities of the heart.

There are a few reported research in related analysis using ECG signals (16, 21, 18, 19). But most of them are on the peak to peak time intervals (also called as R-R intervals) (22, 23) and hence do not include the possible information content in the amplitude variations in the signals. This motivates us to proceed with a detailed analysis that provides unique measures that characterize the actual amplitude variations also from the point of view of the multifractal analysis. Such analysis is also advantageous over R-R interval analysis because the actual amount of time for which the signal needs to be recorded is smaller by orders of magnitude.

As already mentioned, dynamics underlying many natural processes are essentially deterministic but are highly nonlinear and complex. Since generally the dynamical equations or even the effective dimension of the system are not known, we have to rely on observational data of one of the variables or time series of the average responses like ECG to characterize its dynamics. The first step in the analysis is the reconstruction of the underlying attractor from the time series of a single variable (13) based on Taken’s theorem (24, 25). Over the last few decades, this has become an extremely mature field and a wide variety of methods have been developed (24, 25, 26, 27, 28, 29, 30). These methods have found successful applications in diverse fields like astrophysics (31), physiology (21, 32), atmospheric sciences (33, 34), geology (35) and stock markets (36). Among them, an important class of methods is related to the characterization of the complex dynamics using multifractal analysis (13, 37).

We identify two such measures computed from the multi fractal spectrum with their ranges distinctly distinguishable for healthy and diseased cases. This makes the proposed analysis very effective and powerful for clinical purposes where better accuracy compared to visual inspection can be realized. We illustrate this by applying the analysis to ECG signals from a number of normal and pathological subjects and our conclusions are validated using supervised machine learning approach and comparisons with beat replicated data from individual ECG.

1 Phase space reconstruction from ECG data

The data of $97$ unhealthy and $32$ healthy subjects obtained from PhysioBank database (38) are pre-processed to make them suitable for the analysis (See Methods). Each dataset consists of ECG time series taken from six different chest electrodes or channels $v_{1}$ to $v_{6}$ . As a preliminary analysis, we carry out the usual statistical and linear analysis and obtain the power spectra for all the data sets using the Fast Fourier Transform (FFT) algorithm. The frequency with the maximum power for each one of them and the distribution of these peak frequencies for all the subjects are summarized in Fig. 1 for data from six chest electrodes. It is clear that it is not possible to conclusively distinguish the two groups using the peak frequencies as they fall in similar ranges.

Hence we resort to methods of nonlinear analysis and reconstruct phase space structure of the system’s dynamics from its discretely sampled time series $s(t_{k})$ . For a visual display of the resulting phase space structure or dynamical attractor, we use the technique of singular-value decomposition (SVD) (39, 40). (See Methods section for details about embedding). In Fig. 2 we show a few representative embedded attractors from healthy and unhealthy groups. Each of the attractors is shown in the axes corresponding to statistically independent variables obtained from SVD.

Since the complex structure of the attractor inherently contains information about the complexity of its dynamics, it is important to characterize it quantitatively. Several quantifiers have been studied in this context over the years (41), the most effective among them being the set of fractal dimensions. In the following sections, we indicate how correlation dimension is used along with surrogate analysis to identify the nonlinear nature of the underlying dynamics and how the geometrical complexity of its structure is quantified using multifractal measures.

2 Correlation dimension and Surrogate analysis

We recreate the phase space structure of the underlying dynamics in an embedding space of dimension $M$ with delay vectors constructed by splitting the discretely sampled ECG data with delay time $\tau$ . The dimension $M$ is chosen as the value at which any fractal measure, like correlation dimension $D_{2}$ , saturates. The distributions of saturated $D_{2}$ values for healthy and unhealthy groups is shown in Fig. 3. Since all the $D_{2}$ values are less than $4$ , for uniformity, we use $M=4$ as the embedding dimension for all the signals.

Before undertaking any type of nonlinear analysis, it is important to first verify that the observed time series indeed results from an underlying nonlinear process. We check the nonlinear and deterministic nature of the underlying dynamics using a statistically rigorous method of surrogate analysis(26, 27). For this, we generate surrogate data using TISEAN package (42). With correlation dimension as a measure, we plot in Fig. 4 the values for the original signal and the generated surrogates as a function of embedding dimension $M$ . Since the values of $D_{2}$ for surrogate data differ significantly from that of the original signal, we conclude that the signal comes from the underlying nonlinearity.

3 Multifractal analysis

The reconstructed phase-space attractors for most of the complex systems like heart dynamics possess a multifractal structure which is characterized by a set of generalized dimensions $D_{q}$ so that the non-uniformity in the distribution of points on the attractor becomes evident through different values of $q$ . However the local scaling properties on the attractor are captured by a spectrum of singularities related to the probability measure on the course-grained attractor. Thus, if the attractor is covered by boxes of size $r$ , the probability of points in the $i^{\text{th}}$ box scales as $p_{i}(r)\sim r^{\alpha_{i}}$ . For a multifractal, the range of scales $\alpha_{i}$ present is a measure of its complexity. The number of boxes with the same $\alpha$ scales as $N_{\alpha}(r)\sim r^{f(\alpha)}$ . Both $(D_{q},q)$ and $(f(\alpha),\alpha)$ provide analogous characterizations and are related by Legendre transformations as (30):

[TABLE]

Thus, a convenient way of calculating the multifractal spectrum is to calculate the generalized dimensions $D_{q}$ and then use equations (1) and (2) to find out $f(\alpha)$ and $\alpha$ . An algorithmic approach to perform this has been given by Harikrishnan et al (30) and here we follow the same method with suitable modifications for ECG signals with the following mathematical form for $f(\alpha)$ spectrum:

[TABLE]

in which, as described in (30), only four of the five parameters $A,\alpha_{1},\alpha_{2},\gamma_{1},\gamma_{2}$ are independent. These four parameters provide a unique characterization of $f(\alpha)$ spectrum of the multifractal.

In our numerical computations of the four parameters from the multifractal analysis of ECG data, we find that the errors in the values of $\alpha_{2}$ and $\gamma_{2}$ are considerable. This is because they are derived from $f(\alpha)$ curve obtained from the $D_{q}$ data with negative values of $q$ which in turn corresponds to sparse regions of the attractor. With only a finite length of data, the number of vectors in the sparse regions of the attractor tends to become too low in numbers resulting in larger error bars for the $D_{q}$ values. For this reason, while deriving conclusions from the results, we concentrate mostly on the $\alpha_{1}$ and $\gamma_{1}$ values that are associated with the dense regions of the attractor.

The difference $\alpha_{2}-\alpha_{1}$ , that measures the width of the $f(\alpha)$ spectrum provides the range of scaling indices. In Fig. 6, we show the distributions of this difference for healthy and unhealthy groups. It can be concluded that the complexity tends to be more for healthy hearts across all the channels.

3.1 Parameter planes

In Fig. 7, we present the scatter plots of $\alpha_{1}$ and $\gamma_{1}$ values for electrodes $v_{1}$ to $v_{6}$ . The green circles in these plots represent the subjects identified as healthy in the PhysioNet database and the red squares represent the unhealthy ones. A few cases, for which the good fit could not be obtained have been discarded while plotting these parameter planes. For the purpose of visualization, we also show estimated kernel densities for the two groups as a background. As can be seen from these plots, multifractal analysis seems to have picked up almost every case correctly by separating the healthy and unhealthy cases into different clusters in $\alpha_{1}$ - $\gamma_{1}$ planes.

3.2 Blind data testing and success rate

The separation of the two groups into two different clusters becomes useful only if it helps us to predict the group label (healthy or unhealthy) for a new unseen data. This is a standard problem in the theory of machine learning and we use a particular algorithm called a “support vector clustering” or SVC using RBF kernel (43) to find out the regions in $\alpha_{1}$ - $\gamma_{1}$ planes corresponding to the two groups. The known group labels are used as a training data for the algorithm to identify healthy and unhealthy cases and then the algorithm is asked to divide the parameter plane into two regions. The regions so obtained for different channels are shown in Fig. 8.

As another set of quantifiers, we now consider the parameter plane $\alpha_{1}$ - $\alpha_{0}$ , where $\alpha_{0}$ is the $\alpha$ value corresponding to the maximum value of $f(\alpha)$ curve and perform a similar analysis. The resulting regions are shown in Fig. 9.

The regions shown in Fig. 8 and Fig. 9 are obtained by training the SVC algorithm by using the whole data. However, this doesn’t tell us how well the algorithm would perform when given an unseen data. To check this, we split the whole data into two parts: training set and test set. We then train the algorithm on the training data and then ask it to predict the labels for the test data. To measure the success we calculate true positive rate ( $t_{p}$ ) and true negative rate ( $t_{n}$ ). True positive rate in this case is defined as the fraction of correctly identified healthy cases out of the actual healthy cases. Similarly, true negative rate is defined as the fraction of the correctly identified unhealthy cases out of the actual unhealthy cases. We then define the accuracy of the algorithm to be: $\text{Accuracy}=t_{p}\times t_{n}$ .

This definition of the accuracy makes sure that both cases are predicted reasonably accurately since even if one of them is low the accuracy becomes low. In particular, if the algorithm labels all cases to be of the single group, the accuracy becomes zero. The value of the accuracy also depends on how the data is split and so, we average the value over ten random realizations of the splitting. These average accuracies, calculated for $\alpha_{1}$ - $\gamma_{1}$ planes, as a function of size of the training set for different channels are shown in Fig. 10. It can be seen that even for low amount of training data, the accuracy is quite high.

4 Comparison with beat replicated data

The results described in the previous section show that the multifractal analysis is extremely successful for separating healthy and unhealthy classes. Thus, given an ECG time series of a person that is not known to be in one or the other group a priori, we can calculate the corresponding $\alpha_{1}$ - $\gamma_{1}$ values and then using its location in this parameter plane, we can predict the class the person belongs to with sufficient confidence. However, in general, definition of healthy and unhealthy can be quite subjective. For example, in the healthy class, ECG characteristics in general differ based on age, gender, habits, life styles etc. This makes comparing the measures of one set with that of another somewhat arbitrary unless both are with the same age, gender, habits etc, which is not always very practical. Therefore we introduce a finer method of analysis, by checking how the multifractal properties of a given data compare with that of the beat replicated data generated from a single beat in the same ECG. This would in a way compare each data within itself and the range of variations can be used effectively as a quantifier for normal and abnormal hearts.

We extract $10$ different randomly chosen beats from each signal (See Methods). Then we replicate each of these beats to get time series of approximately the same sizes as that of the original time series and perform multifractal analysis for each one of these to obtain the parameters $\alpha$ and $\gamma$ . In Fig. 11, we show distributions of $\alpha_{1}$ values for $20$ randomly chosen subjects from each group. For comparison, we also show the original $\alpha_{1}$ value in each case (yellow circles). As can be seen, the actual $\alpha_{1}$ values in case of healthy subjects tend to coincide with the $\alpha_{1}$ values for beat replicated time series. On the other hand, the actual $\alpha_{1}$ values in case of unhealthy cases tend to be quite far from the mean of the replicated $\alpha_{1}$ values. To make this quantitative, we define $\delta\alpha_{1}$ to be the difference between the mean of the replicated $\alpha_{1}$ values and the $\alpha_{1}$ value for the full time series. The histograms of this quantity for two groups are shown in Fig. 12. It can be seen that the distributions are quite separated from each other. This implies that one can distinguish the two classes solely on the basis of the comparison of values for the data with its own beat replicated data sets.

\matmethods

4.1 Data acquisition

For our analysis presented here, we have used Data from “PhysioNet” Resource with its PhysioBank archive (38). In total, we included data for $97$ abnormal subjects and $32$ healthy subjects. Each dataset consists of $15$ channels which correspond to different electrodes, the conventional $12$ leads ( $i,ii,iii,avr,avl,avf,v_{1},v_{2},v_{3},v_{4},v_{5},v_{6}$ ) together with the $3$ Frank leads ( $v_{x},v_{y},v_{z}$ ). Out of these, in this work we concentrate only on six of the channels $v_{1}$ to $v_{6}$ that correspond to electrodes which are placed directly on the chest. Each signal corresponds to a real time of $60$ seconds and is digitized at $1000$ samples per second to obtain in total $60000$ points per signal.

Among the $97$ patient data available to us, $79$ suffer from Myocardial Infraction (MI), $6$ suffer from Cardiomyopathy, $4$ suffer from Myocarditis, $2$ suffer from Dysrhythmia and $1$ from Hypertrophy while for the remaining $5$ , the disease information is not available. The subjects in the healthy class have age values distributed between $24$ and $69$ whereas the age values in the unhealthy class are distributed between $41$ and $86$ .

4.2 Detrending of the signals

The ECG signals often contain global trends as shown for a typical data in the top panel of Fig. 13. As part of the pre processing, we first remove these trends as described below and the de-trended data thus obtained after removing the global trends is shown in Fig. 13.

To remove the undesirable trends, we fit a polynomial of a certain degree to the signal, which is then subtracted from the actual signal to get the de-trended signal. To choose the appropriate value of the degree $n$ to be used for the fitting polynomial, we define a deviation $\delta_{(n)}$ of the original signal from the detrended signal as:

[TABLE]

We find that $\delta_{(n)}$ saturates as we vary $n$ and hence we can choose $n$ after the saturation point for a given signal. Based on this, for all the datasets we use $n=20$ to detrend them.

Embedding and phase space reconstruction

For uniformity, all the values in the time series $c(t_{k})$ are first scaled between [math] and $1$ by using a transformation of “compression”:

[TABLE]

where $c_{\text{min}}$ and $c_{\text{max}}$ are minimum and maximum values in the time series $c(t_{k})$ respectively. Each time series $s(t_{k})$ is then embedded into an $M$ dimensional space, by constructing vectors as:

[TABLE]

Here a time delay $\tau$ is the time, measured in units of sampling rate $\triangle=t_{i+1}-t_{i}$ , at which autocorrelation of the signal falls to $1/e$ of its original value (29). It is easy to see that there are in total $N-(M-1)\tau$ embedded vectors. Taken’s embedding theorem dictates that the phase space trajectories or attractor obtained from these vectors have the same topological properties as that of the original system(24).

4.3 Extracting a single beat from an ECG time series

Identifying a single beat in an ECG signal is tricky since a beat cannot be defined as a pattern that repeats with exact periodicity in the ECG signal. However, it is easy to see that ECG signals do have a certain approximate periodicity because of the presence of beats. For the data used, in units of milliseconds, the individual beats repeat with a period $T\in(600,1500)$ . We then calculate the autocorrelation of the time series and find out the highest peak in this range. The corresponding time value is then taken as the period of the signal and the same is used to extract a single beat from the time series.

5 Discussion

We report the results of a detailed multifractal analysis of discretized ECG data for two sets of healthy and unhealthy subjects. Our study establishes the highly complex fractal nature of a healthy heart, which gets reduced due to any abnormality in its functions. We could show that the measures derived from the multifractal spectrum can detect abnormalities in heart dynamics with a reasonable level of accuracy. The fact that this is achieved with short time ECG recordings of a significantly small amount of time enhances the scope for its applicability. Moreover the analysis is totally objective and the results are quantitative.

The novelty of our approach is that the subtle variations in the shapes of the beats are captured into readable quantities or values which will be much more reliable than conclusions derived from visual inspections. This information can be used to assess the status of health and risk level of the heart easily. The fluctuations of the beat-to-beat dynamics as revealed in the ECG exhibit a rich complexity that, as carefully characterized in this work, can give indications of cardiac malfunctions. Our analysis, using beat replicated data, helps in comparing quantifiers of one ECG with variations within itself. The study is thus an advancement both in the basic understanding of the fractal nature underlying a complex system like heart and its abnormal states as well as in designing clinically useful practical tools in effective diagnostics and therapy.

\showmatmethods

\acknow

The authors acknowledge the financial support from Dept. of Sci. and Tech., Govt. of India, through a Research Grant No. EMR/2014/000876. We also acknowledge the Physiobank data base (www.physionet.org/physiobank/database/) for the ECG data used in the study reported here.

\showacknow

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Lewis Jr WM (1975) Phase locking, period-doubling bifurcations, and irregular dynamics in periodically stimulated cardiac cells. Oecologia (Berlin) 19:75.
2(2) Glass L (2001) Synchronization and rhythmic processes in physiology. Nature 410(6825):277–284.
3(3) Karma A (2013) Physics of cardiac arrhythmogenesis. Annu. Rev. Condens. Matter Phys. 4(1):313–337.
4(4) Pijn JP, Van Neerven J, Noest A, da Silva FHL (1991) Chaos or noise in eeg signals; dependence on state and brain site. Electroencephalography and clinical Neurophysiology 79(5):371–381.
5(5) Qu Z, Hu G, Garfinkel A, Weiss JN (2014) Nonlinear and stochastic dynamics in the heart. Physics reports 543(2):61–162.
6(6) Webber CL, Zbilut JP (1994) Dynamical assessment of physiological systems and states using recurrence plot strategies. Journal of applied physiology 76(2):965–973.
7(7) Müller A, et al. (2016) Causality in physiological signals. Physiological measurement 37(5):R 46.
8(8) Mendis S, Puska P, Norrving B, , et al. (2011) Global atlas on cardiovascular disease prevention and control. (World Health Organization).