AI-based approach for heart failure readmission prediction using SCG, ECG, and GSR signals

Rajkumar Dhar; Md Rakib Hossen; Peshala T Gamage; Richard H Sandler; Nirav Y Raval; Robert J Mentz; Hansen A Mansy

PMC · DOI:10.1088/1361-6579/ae178c·November 4, 2025

AI-based approach for heart failure readmission prediction using SCG, ECG, and GSR signals

Rajkumar Dhar, Md Rakib Hossen, Peshala T Gamage, Richard H Sandler, Nirav Y Raval, Robert J Mentz, Hansen A Mansy

PDF

Open Access

TL;DR

This study uses AI to predict heart failure readmissions by analyzing chest vibrations and other signals, aiming to improve patient management.

Contribution

A novel AI-based method for predicting heart failure readmission using SCG, ECG, and GSR signals is proposed.

Findings

01

Machine learning models outperformed deep learning in predicting HF readmission.

02

K-nearest neighbor achieved 89.4% accuracy in classifying readmitted and non-readmitted patients.

03

SCG features were found to correlate with heart failure conditions.

Abstract

Objective. Heart failure (HF) is considered a global pandemic because of increasing prevalence, high mortality rate, frequent hospitalization, and associated economic burden. This study explores a noninvasive method that may help in managing HF patients by predicting HF readmission. Methods. Seismocardiogram (SCG) signal is the low-frequency chest vibration produced by the mechanical activity of the heart. SCG signal was acquired from 101 patients with HF, including those readmitted to the hospital during the study period. SCG signals were segmented into heartbeats and clustered based on respiration phases. Features were extracted from each cluster. Several conventional machine learning (ML) models were developed using selected SCG and heart rate variability features. Furthermore, SCG signals were transformed into images using a time–frequency distribution method. Images were used to…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases2

heart failure HF

Figures8

Click any figure to enlarge with its caption.

K-medoid clustering of SCG beats of a representative recording session (Subject 25, 3rd session) in lung volume-flow rate space. Blue circles and red triangles are the beats of the two clusters. A decision boundary (dashed line) is plotted to show the clear separation between the two clusters.

Feature importance scores of input variables computed using random forest based on Gini importance.

(a) SCG medoid beat of a representative subject (subject 3, session 3), and (b) the corresponding time–frequency distribution coefficient heatmap as calculated by PCT.

ROC curves for (a) the machine learning models (b) CNN models. The optimum thresholds are indicated by the yellow points (0.7 for ML and 0.5 for CNN).

Shows the feature values of (1) healthy, (2) non-readmitted, and (3) readmitted groups in a boxplot for (a) intra-session waveform variability before clustering, (b) inter-cluster variability, and (c) intra-cluster variability features. All the feature values are highest in the readmitted patient group and lowest in the healthy group. The differences between each pair of the groups are statistically significant as depicted by p-values (two-sample t-test) at the top of each image. Here, the numbers beside ‘p’ indicate the numbers of the groups being compared (1-healthy, 2-non-readmitted groups, 3-readmitted).

Shows the feature values of (1) healthy, (2) non-readmitted and, (3) readmitted groups in boxplot for (a) LFP, (b) HFP, (c) TP and (d) pNN50. HRV feature values are highest in healthy group and lowest in the readmitted patient group. P-values obtained by two sample t-test demonstrate that each pair of the groups are significantly different.

Tables6

Table 1.. Classification of HF according to LVEF. Here, HFrEF is HF with reduced ejection fraction, HFmEF is HF with mildly reduced ejection fraction, and HFpEF is HF with preserved ejection fraction (Heidenreich et al 2022). LVEF stands for left ventricular ejection fraction.

HF class	LVEF
HFrEF	⩽40%
HFmrEF	41%–49%
HFpEF	⩾50%

Table 2.. Available demographics. Age information was not available.

Category	Details
Gender	Male: 62, female: 19
Height (m)	1.74 ± 0.11
Weight (kg)	101.7 ± 30.7
BMI (kg m⁻²)	33.3 ± 9.4
HF status	HFrEF: 75, HFpEF: 6
NYHA classification^a	I: 2, II: 14, III: 28, IV: 19

Table 3.. Selected SCG (1–7) and HRV (8–11) features with short descriptions. The features (4–7) are obtained by averaging the features from the two cluster representative waveforms (for each recording session).

Feature index	Feature name	Description
1	Intra-session waveform variability before clustering (WV_bc).	The dissimilarity among the SCG beats within a session. Dissimilarity was calculated using dynamic time warping (dtw) distance. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} ${\text{W}}{{\text{V}}_{{\text{bc}}}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^n \frac{{dtw\left( {C,{X_i}} \right)}}{{{l_i}}}$\end{document} $W V_{bc} = \frac{1}{n} \sum_{i = 1}^{n} \frac{d t w (C, X_{i})}{l_{i}}$ C: medoid beat before clustering, X_i: ith SCG beat, l_i: warping path length, n = number of SCG events in a session.	SCG Features
2	Inter-cluster waveform variability (WV_inter)	Average dissimilarity between the medoid of a cluster and SCG beats of the other cluster. WV_inter =\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\frac{1}{{n1 + n2}}\left[ {\mathop \sum \limits_{i = 1}^{n1} \frac{{dtw\left( {{C_1},{X_{i2}}} \right)}}{{{l_i}}} + \mathop \sum \limits_{i = 1}^{n2} \frac{{dtw\left( {{C_2},{X_{i1}}} \right)}}{{{l_i}}}} \right]$\end{document} $\frac{1}{n 1 + n 2} [\sum_{i = 1}^{n 1} \frac{d t w (C_{1}, X_{i 2})}{l_{i}} + \sum_{i = 1}^{n 2} \frac{d t w (C_{2}, X_{i 1})}{l_{i}}]$ n1, n2: number of events in Cluster 1 and 2, C₁^, C₂: SCG medoid of cluster 1 and 2, X_i₁, X_i₂: ith SCG event of cluster 1 and 2
3	Intra-cluster waveform variability (WV_intra)	Average dissimilarity between the medoid and SCG beats of the same cluster WV_intra =\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\frac{1}{{n1 + n2}}\left[ {\mathop \sum \limits_{i = 1}^{n1} \frac{{dtw\left( {{C_1},{X_{i1}}} \right)}}{{{l_i}}} + \mathop \sum \limits_{i = 1}^{n2} \frac{{dtw\left( {{C_2},{X_{i2}}} \right)}}{{{l_i}}}} \right].$\end{document} $\frac{1}{n 1 + n 2} [\sum_{i = 1}^{n 1} \frac{d t w (C_{1}, X_{i 1})}{l_{i}} + \sum_{i = 1}^{n 2} \frac{d t w (C_{2}, X_{i 2})}{l_{i}}] .$
4	Average RMS amplitude of instantaneous frequency (F_ins)	Instantaneous frequency (F_ins) was calculated as the frequency first moment of the time–frequency distribution (PCT), normalized by the integral of PCT at that time instant F_ins =\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\frac{{{{\mathop \smallint \nolimits}}_{0.5}^{50}f{\text{PCT}}\left( {t,f} \right){\text{d}}f}}{{{{\mathop \smallint \nolimits}}_{0.5}^{50}{\text{PCT}}\left( {t,f} \right){\text{d}}f}}.$\end{document} $\frac{\int_{0.5}^{50} f PCT (t, f) d f}{\int_{0.5}^{50} PCT (t, f) d f} .$ Then, the RMS of F_ins was calculated over the duration of the beats under consideration.
5	Average turning point ratio (TPR)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} ${\text{TPR}} = \frac{{N\left( {\left( {{x_i} - {x_{i - 1}}} \right)\left( {{x_i} - {x_{i + 1}}} \right)} \right) > 0}}{{{\text{length of the signal}}}}.$\end{document} $TPR = \frac{N ((x_{i} - x_{i - 1}) (x_{i} - x_{i + 1})) > 0}{length of the signal} .$ Quantification of the randomness in a time-series signal.

6	Average sample entropy (SmEn)	SmEn = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $ - \ln \frac{{{\text{coun}}{{\text{t}}_{m + 1}}\left( {{\text{similar}}} \right)}}{{{\text{coun}}{{\text{t}}_m}\left( {{\text{similar}}} \right)}}$\end{document} $- \ln \frac{coun t_{m + 1} (similar)}{coun t_{m} (similar)}$ ; here denominator and numerator are the number of matched template pairs of length m and m + 1 in the waveform, respectively (Richman and Moorman 2000).

7	Average Higuchi dimension (D_H)	Measures the irregularity in a time-series signal (Higuchi 1988).
8	Low frequency power (LFP)	Spectral power of heart rate (HR) in.04–.15 Hz frequency band.	HRV Features
9	High frequency power (HFP)	Spectral power of HR in.15–.4 Hz frequency band.
10	Total power (TP)	Total spectral power of HR in 0–0.4 Hz frequency band.
11	pNN50	Proportion of successive RR intervals that differ by more than 50 ms.

Table 4.. Performance of the conventional machine learning models, boldface marks the best value in each column.

Model	Sensitivity	Specificity	Precision	F1-score	AUC	Accuracy
KNN	0.88	0.90	0.78	0.83	0.88	0.89
MLP-NN	0.88	0.81	0.65	0.75	0.84	0.83
XGBoost	0.85	0.80	0.64	0.73	0.83	0.82

Table 5.. Performance metrics for the CNN model. 1st row shows the leave-one-subject-out cross-validation (LOOCV) metrics for the balanced dataset (38 patients). The 2nd row shows combined results after adding out-of-sample test set results.

ResNet-34	Sensitivity	Specificity	Precision	F1-score	AUC	Accuracy
LOOCV	0.80	0.82	0.78	0.79	0.86	0.81
Combined	0.80	0.81	0.63	0.70	0.87	0.81

Table 6.. Performance comparison of various methods for predicting patient readmission, boldface marks the best value in each column.

Methods	Subjects (readmitted)	Data type	Accuracy (%)	Sensitivity (%)	Specificity (%)	AUC
Shameer et al (2016)	1068 (178)	Electronic medical record	83.19	—	—	0.78
Awan et al (2019)	10 757 (2546)	Electronic health record	64.9	48.42	70.01	0.63
Cleland and Antony (2011)	501 (58)	Thoracic impedance	—	42.1
Stehlik et al (2020)	100 (49)	ECG, accelerometry, skin impedance, temperature, activity, posture	—	87.5	86.0	0.89
Yu et al (2005)	33 (10)	Thoracic impedance	—	76.9		—
Boehmer et al (2017)	900 (146)	Heart sounds, thoracic impedance, heart rate, activity, respiration rate	—	70	85.7	—
This study	81 (22)	SCG, ECG, GSR	88.9	87.8	90.1	0.88

Funding1

—National Heart, Lung, and Blood Institute10.13039/100000050

Keywords

seismocardiogramheart failure readmissionsignal processingmachine learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Invasive Vital Sign Monitoring · Cardiovascular Health and Disease Prevention · Heart Rate Variability and Autonomic Control

Full text

Introduction

Heart failure (HF) is a chronic progressive medical condition marked by the diminished capacity of the heart to effectively pump blood. HF is a major global health concern with an estimated 64 million cases worldwide (Savarese and Lund 2017) and 6 million in the United States (Virani et al 2020). This is projected to rise to 8.5 million in 2030 in the US (Bozkurt et al 2023). This increasing prevalence mainly accounts for the aging populations who are at greater risk of developing HF. Advances in medical diagnosis and treatment have improved survival rates, prolonging life in individuals with HF (Savarese and Lund 2017). Nevertheless, the mortality rate related to HF is still very high. A meta-analysis by Jones et al in 2018 showed that the 1- and 5 year survival rates of HF are 86.5% and 56.7%, respectively (Jones et al 2019). According to a more recent study by Bozkurt et al, 28% of 263 525 patients died during the first year of first HF hospitalization (2023). Apart from this, the healthcare costs related to HF is also substantial (Lesyuk et al 2018). The total cost for HF was estimated at $43.6 billion in the US, which is projected to increase to$ 70 billion by 2030 (Urbich et al 2020, Heidenreich et al 2022). The main driver of HF healthcare cost is hospitalization (Shafie et al 2018), as HF is associated with a very high number of hospital readmission rates. After discharge, about 25% and 50% of HF patients are readmitted within the 30 d and 6 month periods, respectively (Virani et al 2020, Khan et al 2021). With the increase in HF prevalence, the readmission rate and associated costs are likely to be increased in the coming years. Therefore, early readmission prediction may allow interventions that may reverse patient deterioration and avoid readmission.

HF can be classified based on left ventricular ejection fraction (LVEF). LVEF is the fraction of blood pumped out of the heart’s left ventricle (LV) during systole. It provides a measurement of LV systolic function, which is responsible for ejecting oxygenated blood from the heart to the rest of the body. Normal range of LVEF is 50%–70% (Lang et al 2015). Classification of HF regarding LVEF is illustrated in table 1.

Table 1.: Classification of HF according to LVEF. Here, HFrEF is HF with reduced ejection fraction, HFmEF is HF with mildly reduced ejection fraction, and HFpEF is HF with preserved ejection fraction (Heidenreich et al 2022). LVEF stands for left ventricular ejection fraction.

HFrEF comprises approximately 50% of total HF cases (Murphy et al 2020). Patients with HFrEF have a higher mortality rate than those with HFpEF (Somaratne et al 2009, Burkhoff 2012). Although all-cause readmission is higher in HFpEF, HF readmission is higher in HFrEF (Cui et al 2020). In addition, the cost of readmission is higher in HFrEF patients (Sheikh et al 2021). Regardless of the HF class, the high readmission rate is avoidable with preventive measures (Desai and Stevenson 2012). In -(Stauffer 2011), it was demonstrated that a post-discharge transitional care program can greatly reduce the HF readmission rate and the associated cost. Taking this into account, continuous efforts have been made to build an early and reliable HF readmission prediction model that may help the clinicians to make timely targeted interventions to prevent readmissions.

Electronic health records (EHRs) and wearable sensors are the main data sources that have been used to predict HF readmission. EHR includes patient demographics, medications, vital signs, medical history, laboratory data, etc. Intrathoracic impedance, electrocardiogram (ECG), and seismocardiogram (SCG) can be acquired with wearable devices and used as predictors of HF readmission. The predictive accuracy values of these studies are widely varied. In Shameer et al (2016), authors used EHR data and achieved 83.19% accuracy in 1068 patients. In another study, sensitivity and specificity of 48% and 70% are achieved, respectively, using medical data of 10 757 HF patients (Awan et al 2019). A review article by Liu et al showed that B-type natriuretic peptide (BNP) and N-terminal pro-brain natriuretic peptide (NT-proBNP) are the most used predictors from the EHR data (Liu et al 2022).

Other authors used sensor data to predict HF readmission. Intrathoracic impedance-based models obtained variable predictive accuracy ranging from 21%–76%, suggesting the uncertainty in predicting HF readmission (Yu et al 2005, Cleland and Antony 2011, Heist et al 2014, Stehlik et al 2020). In (Stehlik et al 2020), ECG, skin impedance, temperature, etc were acquired from 100 patients at home with a multisensory patch for 3 months. High prediction accuracy was achieved (sensitivity = 86%, specificity = 87.5%) using the sensor data, although the study required baseline data for analysis. Boehmer et al used defibrillators implanted in patients to acquire data to predict hospitalization (Boehmer et al 2017). Invasive accelerometer-acquired heart sounds (similar to SCG), heart rate, intrathoracic impedance, respiration rate, and tidal volume data were collected from the implanted device, which were able to alert clinicians before HF hospitalization (sensitivity = 70%). In another SCG-based study, Lin et al identified HF patients by calculating LVEF from SCG and ECG signals (Lin et al 2018). In the study, 40 subjects were enrolled (25 HF and 15 healthy). The ratio of pre-ejection period and left ventricular ejection time was calculated from SCG and ECG signals, which was found to be inversely proportional to LVEF (correlation coefficient 0.73). A threshold ratio of 0.33 distinguished HF from healthy participants with 96% accuracy (sensitivity 98% and specificity 94%). Inan et al used SCG signals to distinguish between compensated and decompensated HF patients (2018). The patients needed to perform the 6 min walk test (6MWT) in this study. Similarity between SCG signals before and after the test was used as a metric to differentiate the two groups. Higher similarity was found in decompensated patients, suggesting their reduced cardiovascular reserve. Although the above studies had several limitations, such as requiring baseline data, demanding patients to perform 6MWT, or using invasive measurements, these studies demonstrated the merit of SCG signal in predicting HF readmission. The current study investigates the feasibility of using SCG and machine learning (ML) algorithms for HF readmission prediction when baseline measurements are not available.

Materials and method

Data acquisition

2.1.

The dataset used in this study was collected at Advent Health Orlando after IRB approval by the University of Central Florida (protocol number: BIO-16-12783; the date of approval: March 6, 2023). The study was carried out according to the principles outlined in the Declaration of Helsinki. HF patients were recruited after their discharge from the hospital. Overall, 101 patients were included in this study. Data was acquired in single or multiple sessions per patient, following their provision of written informed consent. After an observer manually checked the data, 24 recording sessions were excluded due to poor quality of the acquired signals (zero voltage or noisy signal). This resulted in the exclusion of 20 patients from the study. Data analysis was performed in the remaining 81 patients who had a total of 142 sessions. The demographic information of the subjects is shown in table 2.

After the initial discharge, 22 patients (who attended 41 recording sessions) were readmitted to the hospital during the window of data acquisition (six months). The protocol included 3 min of data acquisition in each session when patients were sitting on a 45° inclined exam table with their legs extended. The following three signals were acquired from the patients:

i.SCG: Acquired using a tri-axial accelerometer (Model: 356A32, PCB Piezotronics, Depew, NY) placed on the chest surface at the 4th intercostal space near the left lower sternal border. Signal was amplified using a signal conditioner (Model: 482C, PCB Piezotronics, Depew, NY) with a gain of 100. The x, y, and z components of the accelerometer are pointed toward lateral (left to right), caudocranial (head to toe), and dorsal–ventral (normal to chest surface) directions, respectively. This study includes the analysis of the z-axis of the accelerometer.
ii.ECG): Acquired by IX-B3G bio-potential recorder (iWorx Systems, Inc., Dover, NH).
iii.Galvanic skin response (GSR): Provides an estimate of lung volume (Azad et al 2018). Acquired by IX-B3G bio-potential recorder.

All the signals were acquired at a sampling rate of 10 kHz. A schematic representation of data acquisition is shown in figure 1.

Schematic of experiment setup.

Data analysis

2.2.

Overview: The workflow diagram of data analysis is shown in figure 2. The process started with filtering raw signals (band pass = 0.5–100 Hz), followed by the segmentation of SCG and ECG signals (Azad et al 2023). After that, SCG beats were clustered using an unsupervised clustering method (k-medoids clustering) (Gamage et al 2020). The clustering was correlated to the respiration phases, which were obtained from GSR signal. This clustering provides a medoid SCG beat for each cluster. Clustering features (described below) were extracted using the relationship between the medoid SCG beats and the rest of the SCG beats. Other time- and frequency-domain features were extracted from the cluster ‘representative’ beats (described below). Conventional ML models were trained and tested using selected SCG features along with a few heart rate variability (HRV) features. This concludes the first approach of analysis that utilizes conventional ML.

Flow diagram of data analysis.

In the second approach, a few SCG beats (3–5) that were closest (in terms of waveform shape) to the medoid beats were transformed into images using a time–frequency distribution method (polynomial chirplet transform or PCT). The images were fed to a CNN model for training and testing.

Preprocessing

2.2.1.

After visually checking the signal quality, noisy portions of the data were discarded. This noise mainly came from patient movements. The rest of the data (usually 100–140 s) was considered for analysis. The raw ECG, SCG, and GSR signals were downsampled to 1 kHz. After that, ECG and SCG signals were forward–backward filtered using a 4th order Chebyshev type 2 bandpass filter with cutoff frequencies of 0.5 and 100 Hz. The GSR signal was detrended, and a flow rate signal was calculated by differentiating the GSR signal.

Segmentation and normalization

2.2.2.

The R-peaks of the ECG signal were detected using the Pan–Tompkins algorithm (Tompkins 1985). SCG and ECG beats were chosen to start 0.1 s before the ECG R-wave and end 0.1 s before the next R-wave. After segmentation, each SCG beat was normalized by its peak-to-peak amplitude.

Unsupervised clustering (k-medoid clustering)

2.2.3.

Studies on SCG signals reported that SCG signals have morphological variability (Azad et al 2019, Sandler et al 2019, Gamage et al 2020). The clusters of similar SCG beats were found to correlate with the respiration phases. It was suggested that clustering SCG beats into two clusters optimally lowers the variability and makes the feature extraction more accurate (Gamage et al 2020). To group the SCG beats with close morphological features, the k-medoids clustering method was used. The unsupervised clustering method requires two initial beats. Efficient clustering depends on good initialization. In the current study, the SCG beats are initially divided into two groups based on either lung volume (high and low) or flow rate (high and low). SCG beats are considered to be more similar when the distance between them is smaller. Dynamic time warping (DTW) and cross-correlation methods are the two methods chosen to measure the distance (i.e. morphological dissimilarity) between the SCG beats. After dividing the beats into two groups based on lung volume and flow rate, center beats were chosen from each group that had the minimum sum of distances with their neighboring beats in the same group. These two center beats are chosen as the initial beats for the k-medoids method, which is named as initial medoids. After obtaining the initial medoids, the clustering process began. The algorithm continued to update the cluster medoids by calculating the sum of distances and then update the clusters by grouping the beats that have morphological similarities measured by DTW distance. The algorithm stopped when there was no change in the assignment of the SCG beats to the clusters in two consecutive iterations. As there were two bases of grouping (lung volume and flow rate) and two distance measuring methods (DTW and cross-correlation), all four combinations of getting the initial medoids were performed. The combination that produced the most optimum clustering of SCG beats was selected. Clustering quality was also checked by plotting the clustered beats in a lung volume-flow rate space (figure 3). A decision boundary was drawn to visualize the separation of the beats into two clusters.

K-medoid clustering of SCG beats of a representative recording session (Subject 25, 3rd session) in lung volume-flow rate space. Blue circles and red triangles are the beats of the two clusters. A decision boundary (dashed line) is plotted to show the clear separation between the two clusters.

After getting the cluster medoids, 15% of SCG beats that are closest (measured by DTW distance) to the medoid signal in a cluster were averaged to create a SCG beat that is a representative of that cluster. Features were extracted from both cluster medoids and cluster representatives.

Feature extraction and selection

2.2.4.

In total, 63 SCG features were extracted. These include clustering, time- and frequency-domain features. In addition, 8 HRV features were added to complete the feature set. The random forest (RF) algorithm was employed for feature selection. RF is a popular and powerful algorithm that falls under the embedded feature selection method. This embedded method combines the benefits of the other two feature selection methods (filter and wrapper) by allowing interaction with the classifier (like the wrapper method) and being computationally lighter while at the same time producing better classification results (Guo et al 2019, Pudjihartono et al 2022). 11 features were selected (7 SCG and 4 HRV features). A list of selected features is given in table 3, and the feature importance scores are provided in figure 4.

Feature importance scores of input variables computed using random forest based on Gini importance.

Image construction using time–frequency conversion

2.2.5.

For the deep learning approach (approach 2 in figure 2), PCT (a time frequency distribution method) of the SCG signals was calculated and resulted in images. Depending on the length of session data, 3–5 SCG beats closest (as measured by DTW) to the medoid signals were processed by PCT. This resulted in 2D images with time and frequency information in horizontal and vertical axes, respectively (figure 5(b)). The PCT coefficient values were presented using the ‘Parula’ colormap. PCT is found to be more suited than other TFD methods for SCG and heart sound-related studies (Taebi and Mansy 2017, Bao et al 2023).

(a) SCG medoid beat of a representative subject (subject 3, session 3), and (b) the corresponding time–frequency distribution coefficient heatmap as calculated by PCT.

Conventional ML algorithms

2.2.6.

Three different ML algorithms were employed to evaluate the efficacy of the feature set in predicting HF readmission. These methods are k-nearest neighbor (KNN), multilayer perceptron neural network (MLP-NN), and extreme gradient boosting (XGBoost). Since there was an imbalance in the number of observations between the two classes, the decision threshold governing the conversion of the prediction probability to a class label was shifted from the default value of 0.5 and tuned to 0.7 to maximize sensitivity. The leave-one-subject-out cross-validation (LOOCV) approach was used for testing to avoid subject bias.

Convolutional neural network

2.2.7.

For image classification, the Residual Networks (ResNet-34) model was used. ResNets are being widely used in image classification after being introduced by He et al (2015). Several ResNet-based time–frequency image classification tasks have been studied previously (Diker et al 2019, Zhang et al 2021, Liu et al 2022). In this study, a 34-layer CNN network, ResNet-34, was used. Images were resized to 224 by 224 pixels with nearest neighbor interpolation to match the input requirement of ResNet-34. Image augmentation was performed by transformations such as random flips (horizontal and vertical) and rotation. The Adam optimizer with a learning rate of 0.000 008 was chosen. Cross-entropy loss metric was used for performance measurement. The number of epochs was 30 with a batch size of 8.

A balanced dataset, including all the readmitted patients and a subset of non-readmitted patients, was created to address the class imbalance issue for CNN. The number of observations for both the classes was balanced by random undersampling the majority class (non-readmitted patients). This dataset had 38 patients with 90 sessions (22 readmitted with 41 sessions) who were trained and tested by LOOCV. The remaining 43 non-readmitted patients with 52 sessions were not included in the training and only used for out-of-sample testing. These patients were tested using a model trained by data from all the sessions of the 38 patients. This also mimics a real-life application of the developed deep learning model, where the model is trained using the available HF patient data, and the trained model predicts the readmission of the future HF patients.

Results

Five metrics were used to show the results (equations (1)–(5)),

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{{\text{sensitivity}}} / {{\text{recall}}} = \frac{{{\text{True positive}}}}{{{\text{True positive} + \text{False negative}}}}\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\text{specificity}} = \frac{{{\text{True negative}}}}{{{\text{True negative} + \text{False positive}}}}\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\text{precision}} = \frac{{{\text{True positive}}}}{{{\text{True positive} + \text{False positive}}}}\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}F1 - {\text{score}} = \frac{{2*{\text{precision}}*{\text{recall}}}}{{{\text{precision} + \text{recall}}}}\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\text{accuracy}} = \frac{{{\text{True positive} + \text{True negative}}}}{{{\text{True positive} + \text{True negative} + \text{False positive} + \text{False negative }}}}.\end{equation*}\end{document}

The results obtained are presented in tables 4 and 5, and the ROC curves are shown in figure 6.

ROC curves for (a) the machine learning models (b) CNN models. The optimum thresholds are indicated by the yellow points (0.7 for ML and 0.5 for CNN).

These results suggest that conventional ML algorithms performed better than the deep neural network (DNN) model with higher sensitivity. Specifically, KNN outperformed all other models with close to 90% accuracy.

The quantitative comparisons of different HF readmission prediction models are presented in table 6.

Discussion

A non-invasive approach of predicting HF readmission was proposed and tested in this study. The linear acceleration in dorsal–ventral direction was analyzed and used to classify HF patients (admitted vs non-readmitted). Data analysis was performed in two different approaches: (a) conventional ML and (b) deep learning. In the first approach, features were first extracted from SCG beats and HRV. Feature selection was performed, followed by using three different ML algorithms. For the second approach, time–frequency distribution (PCT) was applied to convert the time-domain signal into a 2D image with time and frequency information. The images were resized and fed into a CNN network (ResNet-34) for classification.

Results showed that handcrafted features provided better accuracy than the CNN method. One reason for this can be the inclusion of HRV features in the feature set, which was not provided to the CNN model. Given the higher performance of conventional ML models (with the SCG and HRV features), a discussion of these features that correlate those with HF conditions may be useful. The focus here will be given to SCG clustering features and HRV features.

The first three features in table 3 are the SCG clustering features. The first feature is intra-session waveform variability calculated before clustering. This feature represents the dissimilarity among SCG beats during a session. Inter- and intra-cluster variability features were also obtained after clustering. These features present the average dissimilarity of SCG beats between and within the clusters, respectively. Overall, these clustering features indicate the beat-to-beat waveform variability. The distributions of the clustering feature values in non-readmitted and readmitted patient groups are shown in figure 7. For comparison, feature values of a group of 14 healthy subjects are also shown. Data was acquired from the healthy subjects using the same protocol.

Shows the feature values of (1) healthy, (2) non-readmitted, and (3) readmitted groups in a boxplot for (a) intra-session waveform variability before clustering, (b) inter-cluster variability, and (c) intra-cluster variability features. All the feature values are highest in the readmitted patient group and lowest in the healthy group. The differences between each pair of the groups are statistically significant as depicted by p-values (two-sample t-test) at the top of each image. Here, the numbers beside ‘p’ indicate the numbers of the groups being compared (1-healthy, 2-non-readmitted groups, 3-readmitted).

HF is associated with chronic sympathetic/parasympathetic imbalance resulting in increased sympathetic and decreased parasympathetic drive (Binkley et al 1991, Mann 1999, Braunwald and Bristow 2000, Floras 2009). This also decreases peripheral acetylcholine (ACh) secretion (Roy et al 2014). ACh is the main neurotransmitter of the parasympathetic nervous system (Sam and Bordoni 2023). Binding inhibition of ACh to receptors in the heart has several effects, such as increasing heart rate and heart contraction force, etc (Galper and Smith 1978, Moss et al 2018). In fact, increasing ACh might be a logical HF treatment since it may reverse the effect of decreased ACh with HF (Roy et al 2014, Koncz et al 2022).

The trend of increased beat-to-beat SCG waveform variability with worsened HF (see figure 7) may be explained by the decreased acetylcholine (ACh) release in HF. In an animal study, Ahammer et al reported that decreased ACh increased beat-to-beat contraction strength variability of murine atrial preparation (2018). In that study, hearts were removed, and the atria were dissected from the ventricles. Variability analysis of contraction strengths was performed under control and ACh-treated conditions. Variability of contraction strength was significantly higher in control tissue (which had lower ACh). This suggests that decreased ACh in HF may play a role in increasing the beat-to-beat variability of cardiac contraction. Increased cardiac contraction variability (associated with decreased ACh secretion) is believed to be a major contributor to SCG signal variability (Rienzo et al 2013, Taebi et al 2019). The lowest variability was found in the healthy group (figure 7), which further strengthens this argument.

Another important factor to be considered here is the trend in HRV features. Figure 8 shows the boxplots of selected HRV features for the 3 groups of subjects.

Shows the feature values of (1) healthy, (2) non-readmitted and, (3) readmitted groups in boxplot for (a) LFP, (b) HFP, (c) TP and (d) pNN50. HRV feature values are highest in healthy group and lowest in the readmitted patient group. P-values obtained by two sample t-test demonstrate that each pair of the groups are significantly different.

It is evident from figure 8 that, compared to healthy subjects, HRV features decline in non-readmitted HF patients, then decline further in readmitted HF patients. This can also be due to cardiovascular autonomic imbalance, as HRV is increased by parasympathetic nervous activation and decreased by sympathetic nervous system activation (Berntson et al 1997).

The following are some of the important takeaways from this research:

1.The current study demonstrates the potential use of SCG signals in HF readmission prediction.
2.In this study, conventional ML algorithms, especially KNN, outperformed the DNN model. Other than adding the HRV features, an extensive dataset would benefit the DNN model. In the future, the performance of lightweight DNN models should be evaluated and compared with current study results.
3.Future analysis should include SCG signal in two other directions (lateral and caudocranial axes). The inclusion of a 3-axis gyroscope sensor in the protocol would cover more complete cardiac movement by incorporating angular velocity of the heart. This can elicit more useful features related to HF readmission. Additionally, adding ERHs can improve the classification performance of the models.
4.The advantage of using handcrafted features is the interpretability of the features. Extracting features based on physiological knowledge can make the results more meaningful and reveal underlying characteristics of the data. On the other hand, the use of the DNN model eliminated the need for manual feature engineering at the cost of interpretability. The future work of this study would be to focus on understanding the DNN model results by incorporating explainable AI techniques.
5.More patient data is required to confirm the current study results and apply them in clinical settings.
6.The possibility that noncardiac comorbidities such as chronic kidney disease, diabetes, dementia, etc could be the cause of an HF readmission is one study limitation. SCG is limited to predicting the readmissions associated with cardiac conditions.
7.More gender-balanced data should be collected to compare the outcomes for the two sexes. Age matching between healthy subjects and patients could not be achieved due to the unavailability of age information for the patient group. In future studies, efforts should be made to record and incorporate age data to enable more accurate comparisons.

Conclusion

This study describes a non-invasive technique to predict HF readmission. SCG, ECG, and GSR signals were acquired from non-readmitted and readmitted HF patients as well as normal subjects. After preprocessing and feature extraction, conventional ML algorithms and deep learning models were applied to classify the two patient groups. Results showed that the KNN model achieved the highest classification accuracy of about 90%. This suggests that SCG signal has potential utility for monitoring patients with cardiac disease. Early HF readmission prediction may potentially help the clinicians to identify the patients who need special care and treatment and make rapid targeted interventions to avoid readmission. This will ensure better management of HF patients and reduce the mortality rate. More patient populations with different cardiac conditions may be added for clinical application of SCG signals in the future.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ahammer H Scheruebel S Arnold R Mayrhofer-Reinhartshuber M Lang P DolgosÁ Pelzmann B Zorn-Pauly K 2018 Sinoatrial beat to beat variability assessed by contraction strength in addition to the interbeat interval Front. Physiol.954610.3389/fphys.2018.0054629867582 PMC 5968354 · doi ↗ · pubmed ↗
2Awan S E Bennamoun M Sohel F Sanfilippo F M Dwivedi G 2019 Machine learning-based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics ESC Heart Fail.642835428–3510.1002/ehf 2.1241930810291 PMC 6437443 · doi ↗ · pubmed ↗
3Azad K Gamage P T Sandler R H Raval N Mansy H A 2018 Detection of respiratory phase and rate from chest surface measurements J. Appl. Biotechnol. Bioeng.535962359–6210.15406/jabb.2018.05.00165 · doi ↗
4Azad M K Gamage P T Dhar R Sandler R H Mansy H A 2023 Postural and longitudinal variability in seismocardiographic signals Physiol. Meas.4402500110.1088/1361-6579/acb 30e PMC 996981436638534 · doi ↗ · pubmed ↗
5Azad M K Gamage P T Sandler R H Raval N Mansy H A 2019 Seismocardiographic signal variability during regular breathing and breath hold in healthy adults 2019 IEEE Signal Processing in Medicine and Biology Symp. (SPMB)171–710.1109/SPMB 47826.2019.9037852 · doi ↗
6Bao X Xu Y Lam H-K Trabelsi M Chihi I Sidhom L Kamavuako E N 2023 Time-frequency distributions of heart sound signals: a Comparative study using convolutional neural networks Biomed. Eng. Adv.510009310.1016/j.bea.2023.100093 · doi ↗
7Berntson G G et al 1997 Heart rate variability: origins, methods, and interpretive caveats Psychophysiology 3462348623–4810.1111/j.1469-8986.1997.tb 02140.x 9401419 · doi ↗ · pubmed ↗
8Binkley P F Nunziata E Haas G J Nelson S D Cody R J 1991 Parasympathetic withdrawal is an integral component of autonomic imbalance in congestive heart failure: demonstration in human subjects and verification in a paced canine model of ventricular failure J. Am. College Cardiol.1846472464–7210.1016/0735-1097(91)90602-61856414 · doi ↗ · pubmed ↗