A Hybrid CNN-SVM Approach for ECG-Based Multi-Class Differential Diagnosis of PTSD, Depression, and Panic Attack
Parisa Ebrahimpour Moghaddam Tasouj, Gökhan Soysal, Osman Eroğul, Sinan Yetkin

TL;DR
This paper introduces a new AI system that uses ECG signals to accurately diagnose PTSD and distinguish it from depression and panic attacks.
Contribution
The study presents the first ECG-based hybrid AI framework for multi-class differential diagnosis of PTSD, depression, and panic attacks.
Findings
Hybrid CNN-SVM models achieved 97% accuracy in diagnosing PTSD and related disorders.
ResNet50 and AlexNet combined with SVMs outperformed standalone CNNs.
The system successfully distinguished PTSD from depression and panic attacks with high accuracy.
Abstract
Background: PTSD diagnosis is challenging. Symptoms overlap with depression and panic attacks. This causes misdiagnosis and delayed treatment. Current methods lack objective biomarkers. This study presents a hybrid AI framework. It combines CNNs and SVMs. The system detects PTSD from ECG signals. Methods: ECG data from 79 participants were analyzed. Four groups were included. PTSD patients numbered 20. Depression patients numbered 20. Panic attack patients numbered 19. Healthy controls numbered 20. Wavelet transform created scalograms. Three CNN models were tested. AlexNet, GoogLeNet, and ResNet50 were used. Deep features were extracted. SVMs classified the features. Five-fold validation was performed. Statistical tests confirmed significance. Results: Hybrid models performed robustly. ResNet50 + SVM and AlexNet + SVM achieved statistically equivalent results with accuracies of 97.05%…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsECG Monitoring and Analysis · Emotion and Mood Recognition · Mental Health via Writing
1. Introduction
Post-traumatic stress disorder (PTSD) represents one of the most debilitating psychiatric conditions, characterized by intrusive re-experiencing, avoidance behaviors, negative alterations in cognition and mood, and hyperarousal symptoms [1]. Recent studies show that about 3.6% of adults in the U.S. experience PTSD each year, while lifetime prevalence is estimated at 6.8% in the U.S. and 3.9% across the world population [2]. Women are disproportionately affected by PTSD, with lifetime prevalence nearly twice as high in women (10–12%) as in men (5–6%). A similar disparity is evident in adolescence, where prevalence is 8.0% for females compared to 2.3% for males [3]. PTSD imposes a substantial economic burden, with annual costs reaching billion in the United States [4], while European data demonstrates healthcare expenditures three times higher than controls, with lifetime costs approximating €43,000 per patient [5].
The clinical differentiation of PTSD presents significant diagnostic challenges due to substantial symptom overlap with panic attacks and major depressive disorder [6]. Post-traumatic stress disorder (PTSD) and panic attacks are pretty similar in terms of autonomic symptoms; panic attacks are reported as a secondary symptom in approximately 30–60% of PTSD patients [7]. The diagnostic complexity is further amplified by PTSD’s substantial comorbidity with major depressive disorder, as approximately 52% of individuals with PTSD meet criteria for comorbid depression, while 36–61% of patients presenting with primary depression harbor undiagnosed PTSD [8,9,10]. The diagnostic confusion stems from shared symptom clusters including anhedonia, emotional numbing, sleep disturbances, concentration difficulties, and social withdrawal [11]. Patients with dual PTSD-depression diagnoses exhibit reduced treatment response rates and longer recovery trajectories [12]. This diagnostic overlap necessitates sophisticated approaches to differential diagnosis, as misclassification can lead to suboptimal treatment selection [13].
The diagnostic confusion stems from convergent effects on shared cardiovascular pathways, particularly through dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and sympathoadrenal system [14]. Meta-analytic evidence demonstrates that PTSD confers a 55–61% increased risk of coronary heart disease [15], with all three disorders triggering excessive catecholamine release, resulting in identical acute cardiovascular manifestations including elevated heart rate, blood pressure fluctuations, and altered heart rate variability [16]. While anxiety states activate both HPA and sympathoadrenal axes simultaneously, panic attacks demonstrate predominant sympathetic activation with minimal HPA involvement [17]. The chronic dysregulation leads to sustained cardiovascular risk through endothelial dysfunction, accelerated atherosclerosis, increased inflammatory marker expression, and altered autonomic nervous system balance [18]. Recent research utilizing the Trier Social Stress Test reveals that PTSD patients exhibit blunted acute stress responses with slower cardiovascular recovery and reduced heart rate variability [19]. These shared pathophysiological mechanisms underscore the need for objective, signal-based diagnostic approaches [20].
The emergence of deep learning (DL) technologies has revolutionized psychiatric disorder classification by providing unprecedented capabilities to extract complex patterns from neuroimaging and physiological signals [21]. Recent studies have demonstrated the effectiveness of deep learning techniques in detecting cardiovascular diseases, identifying ECG arrhythmias, and performing automated health classification [22,23,24,25,26].
Recent bibliometric analyses reveal rapid growth in DL applications for mental health disorders, with over 2811 research publications demonstrating CNN accuracies exceeding 98% for depression, schizophrenia, and anxiety disorders [27,28]. Neuroimaging-based DL approaches utilizing EEG, fMRI, and structural MRI have demonstrated remarkable success, with CNN-LSTM hybrid architectures showing superior performance in capturing both spatial and temporal features [29,30]. Multimodal deep learning algorithms analyzing EEG, fNIRS, and neuroimaging data yielded significant results, achieving 97.26% classification accuracy for schizophrenia detection and 94.34% for generalized anxiety disorder [31,32]. However, current DL approaches face significant limitations, including small heterogeneous datasets, lack of external validation, and the inability to effectively differentiate between disorders with overlapping symptomologies such as PTSD, depression, and panic attacks [33]. This technological gap highlights the urgent need for innovative DL methodologies [34].
Despite remarkable achievements in psychiatric disorder classification through neuroimaging, ECG-based DL for PTSD detection remains largely unexplored [35]. While existing research has successfully utilized DL for cardiac pathology detection, psychiatric applications remain predominantly confined to binary PTSD versus control classification [36,37]. CWT emerges as the optimal solution, generating scalogram representations that simultaneously preserve time localization and frequency decomposition, creating rich visual patterns that CNNs can effectively process [38]. The scalogram representation maintains non-stationary characteristics of cardiac signals, captures transient events crucial for psychiatric state identification, and provides multi-resolution analysis across different time scales [39]. Recent methodological advances demonstrate that CWT-based scalograms enable CNNs to achieve superior performance in cardiac signal classification tasks [40]. However, current ECG-based psychiatric classification research faces significant limitations, including a focus on binary classification and the absence of comprehensive frameworks for distinguishing between overlapping psychiatric conditions [41].
Despite recent advances in artificial intelligence applications for mental health diagnostics, prior studies have predominantly focused on binary classification tasks (e.g., PTSD vs. control) and have not fully leveraged ECG-based features for multi-class differentiation. In addition, no comprehensive framework currently integrates time–frequency representations with hybrid deep and machine learning architectures to address overlapping psychiatric conditions.
To address these critical research gaps, this study introduces the first comprehensive framework for ECG-based multi-class psychiatric disorder classification through two key innovations. First, we develop a deep learning system capable of simultaneously differentiating PTSD, depression, panic attacks, and healthy control states from ECG signals, advancing beyond existing binary classification approaches to clinically relevant differential diagnosis. Second, we propose a novel hybrid CNN-SVM architecture that combines ResNet50’s automatic feature extraction with SVM’s robust classification performance, enhanced through PCA for optimal dimensionality reduction. This hybrid approach transforms CWT-derived scalograms into discriminative features that outperform individual CNN or traditional machine learning methods. Our framework employs rigorous 5-fold cross-validation and explores multiple ECG segment lengths to optimize diagnostic accuracy.
The main contributions of this study are summarised as follows:
- A novel multi-class ECG-based diagnostic framework is introduced for differentiating PTSD, major depression, panic disorder, and healthy controls, addressing a gap left by prior binary classification studies.
- Time–frequency representations (CWT-based scalograms) are used to capture patterns of autonomic dysregulation relevant to psychiatric conditions.
- Multiple deep learning architectures (AlexNet, GoogLeNet, ResNet50) are systematically compared for the multi-class psychiatric ECG classification task.
- A hybrid CNN–SVM pipeline enhanced by PCA is proposed to combine automatic deep feature extraction with robust machine-learning discrimination.
- Four ECG segment durations (5 s, 10 s, 15 s, 20 s) are evaluated to investigate the effect of temporal resolution on diagnostic accuracy.
- A comprehensive evaluation is conducted using accuracy, precision, recall, F1-score, AUC, and confusion-matrix analyses to identify the best classifier and window length.
2. Materials and Methods
This dual-stage framework leverages direct CNN classification and CNN-based feature extraction combined with SVM classification to maximize diagnostic accuracy. Figure 1 depicts the framework, which consists of: first, raw ECG signals undergo normalization and baseline wander correction to eliminate low-frequency artifacts. Subsequently, the signals are segmented into fixed lengths suitable for time-frequency analysis. In the next stage, the CWT is applied to each segment to generate scalogram images that encode both temporal and spectral characteristics. These scalograms are fed into pre-trained convolutional neural networks (AlexNet, GoogLeNet, and ResNet50) for direct multi-class classification. In parallel, the CNN architectures are employed as feature extractors within a hybrid classification approach. Extracted features are standardized using Z-score normalization and reduced via Principal Component Analysis (PCA) before classification with a Support Vector Machine (SVM).
2.1. Dataset and Participants
ECG recordings were obtained from Gülhane Education and Research Hospital (2017–2022) under ethics approval (2024/25). All participants were psychiatrically evaluated and classified into PTSD (n = 20), depression (n = 20), panic attack (n = 19), and healthy control (n = 20) groups.
All ECG recordings were reviewed by a cardiologist to verify technical signal quality (baseline stability, adequate signal-to-noise ratio, minimal artifacts). All 79 recordings met quality standards. This technical validation was independent of psychiatric diagnosis confirmation by psychiatrists. Data were sampled at 200 Hz with electrode impedance 10 kΩ. The dataset is based on 5-min ECG recordings. Each recording contains approximately 60,000 samples. The signals were divided into segments of 5, 10, 15, and 20 s to generate scalogram images. This segmentation process resulted in a significant increase in the total number of images. The number of samples and scalograms for each segment duration are provided in Table 1. For clarity, throughout the manuscript and figures, group labels are abbreviated as follows: PTSD = Post-Traumatic Stress Disorder, DEPR = Major Depression, PANIK = Panic Disorder, and KONT = Healthy Control.
ECG signals are non-stationary signals, with high correlation between consecutive 5-s segments taken from the same person, not expected. 5-s intersegment correlation was calculated from a representative subject. As shown in Figure 2, the 60 × 60 correlation matrix shows that the average correlation is very low (−0.0038, std: 0.0943) and the segments can be evaluated independently.
2.2. Preprocessing
2.2.1. Artifact Removal
Baseline Wander represents a low-frequency interference component in ECG recordings that originates from external factors, physiological influences, and environmental noise sources. To address this issue, a finite impulse response (FIR) based high-pass zero-phase filter was implemented with a cutoff frequency set at 0.5 Hz. This filtering approach effectively eliminated low-frequency noise components from the acquired signals. Subsequently, the filtered data underwent Z-score standardization for amplitude normalization. Z-score standardization serves as a crucial preprocessing technique that harmonizes signal magnitudes across different recordings. This normalization strategy facilitates the analysis of heterogeneous signals within a consistent analytical framework by eliminating inter-signal amplitude discrepancies. The implementation of this preprocessing pipeline ensures signal homogeneity and enhances the reliability of subsequent analytical procedures.
2.2.2. Continuous Wavelet Transform (CWT)
CWT helps us to use time-frequency features together by converting one-dimensional ECG signals into two-dimensional scalogram images. CWT effectively extracts non-stationary signal features by providing simultaneous time-frequency analysis of the signal. The resulting scalogram analyzes both low-frequency components (P and T waves) and high-frequency features (QRS complexes) by visualizing temporal frequency changes. CWT is [42]. Mathematically defined as follows:
where a represents the scale parameter and b denotes the translation parameter. Various wavelet functions, including Bump, Morse, and Morlet wavelets, were evaluated for signal transformation. The Morlet wavelet demonstrated superior performance and was selected for this study due to its optimal frequency resolution characteristics. This transformation preserves both temporal and spectral information, making it particularly suitable for analyzing the dynamic characteristics of ECG signals.
2.3. CNN Training and Optimization
2.3.1. Stochastic Gradient Descent (SGD) Optimization
Stochastic Gradient Descent (SGD) is a widely used optimization algorithm in deep learning models. Unlike traditional gradient descent algorithms, SGD calculates gradient values using randomly selected mini-batch samples instead of the entire dataset in each iteration. This method significantly reduces computational cost for large datasets and lowers the risk of getting stuck in local minima due to its stochastic nature. SGD’s basic update rule allows movement in the direction of the cost function gradient in the parameter space. Momentum-enhanced SGD (SGDM) uses weighted averages of previous updates to reduce oscillations in the optimization process and provide more reliable convergence.In this study, the momentum coefficient = 0.9 and learning rate = 0.0001 was set:
where is the momentum vector, is the momentum coefficient (e.g., 0.9), is the learning rate (e.g., 0.0001), is the gradient of the cost function, and is the parameter being updated [43].
2.3.2. Hyperparameters
In binary and multi-class classification, CNNs were carefully selected with hyperparameters during the training process to ensure optimum performance. A mini-batch size of 20 samples was used to balance computational efficiency with gradient estimation accuracy; this maintained stable convergence while providing sufficient stochastic noise to prevent overfitting.The maximum epoch limit was set to 8 iterations over the entire dataset to prevent excessive memorization of training patterns and ensure good generalization capability. Stochastic Gradient Descent with Momentum (SGDM) was selected as the optimization algorithm due to its ability to accelerate convergence and reduce oscillations in the loss landscape through the incorporation of previous gradient information. The learning rate was conservatively set to 0.0001 to ensure stable weight updates without overshooting optimal solutions, particularly important given the sensitivity of deep networks to parameter changes. Validation frequency was configured to evaluate model performance every 10 training steps, enabling early detection of overfitting and providing regular monitoring of generalization performance throughout the training process. Hyperparameters were selected empirically based on validation performance in preliminary experiments. Hyperparameters were tuned using 20-s segments with AlexNet, then applied to all other configurations without further tuning. Table 2 presents the hyperparameters employed during model training.
All experiments were conducted on a MacBook (Apple Inc., Cupertino, CA, USA) Pro (2019) with Intel Core i5 processor (1.4 GHz), 8 GB RAM, and Intel Iris Plus Graphics 645, using MATLAB R2023a. Table 3 summarizes the training and inference performance across all architectures and segment lengths. Training times decreased substantially with longer segments due to reduced sample counts, while all models achieved real-time inference capability (<0.5 s per 5-min ECG) on consumer-grade hardware without requiring GPU acceleration.
2.3.3. Cross-Validation
Cross-validation is a technique used to evaluate a model’s generalization performance by dividing the dataset into k equal parts. In each iteration, one part is reserved as the test set while the remaining k − 1 parts are used for training. This process is repeated k times to ensure each part is tested. The final accuracy is calculated as the average of all k fold accuracies, providing a more reliable estimate than a single train-test split [44]. The selection of k value is based on the bias-variance trade-off, and selecting the optimal k value by testing different values is a standard practice. In this study, k = 5 was selected for cross-validation.
Cross-validation helps evaluate how well a model performs. The dataset is divided into k equal parts. Each time, one part becomes the test set. The other k − 1 parts are for training. This repeats k times so every part gets tested once. We calculate the final accuracy by averaging all k results. This gives better estimates than splitting data only once. Choosing k depends on balancing bias and variance. Testing different k values is common practice. We used k = 5 in our study.
To prevent data leakage, cross-validation was performed at the participant level. In each fold, all segments from a given participant were assigned exclusively to either the training or the test set, ensuring no participant’s data appeared in both sets within any fold.
2.4. Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction of high-dimensional data. PCA creates a hierarchical coordinate system by finding directions that capture the maximum variance in the data. The algorithm first obtains mean-centred data by subtracting the mean of the data matrix, then performs eigenvalue decomposition of the covariance matrix.
In this study, PCA was applied to high-dimensional feature vectors extracted from CNN fully connected layers, retaining 95% cumulative explained variance to ensure minimal information loss while reducing computational complexity and overfitting risk. For AlexNet, GoogLeNet, and ResNet50 models, dimensionality reduction rates of 72%, 66%, and 61% were achieved, respectively. The PCA is mathematically defined as:
where is the mean of the data matrix X, B is the mean-centered data matrix, C is the covariance matrix, D is the diagonal matrix of eigenvalues, V contains the corresponding eigenvectors obtained from the eigendecomposition of C, and T represents the transformed data in the principal component space [45]. Table 4 presents the original feature dimensions and the reduced dimensions after PCA for various CNN models and segment lengths.
2.5. Evaluation Metrics for Multi-Class Classification
In multi-class classification problems, performance evaluation employs micro and macro averaging approaches, which offer different computational methods and perspectives. The micro averaging approach aggregates the TP, TN, FP, and FN values across all classes to compute a single metric. In this approach, class sizes (sample counts) influence the results, with larger classes carrying greater weight. The macro averaging approach, conversely, calculates metrics separately for each class and then takes the arithmetic mean of these metrics. Grandini et al. [46] provided a comprehensive examination of multi-class classification metrics. This method is independent of class sizes and provides a balanced evaluation by assigning equal weight to each class.
3. Results
3.1. Multi-Class Classification Performance
Three different CNN architectures were evaluated for PTSD, depression, panic attacks, and multi-class classification of healthy control groups at varying segment lengths. The AlexNet model achieved an overall accuracy of 94.85% with an MCC value of 0.93 (Table 5). The GoogLeNet model showed improved performance, yielding an accuracy of 96.14% and an MCC of 0.95 (Table 6). The ResNet50 model achieved the highest performance on 5-s segments, with an overall accuracy of 96.65%, an MCC value of 0.96, and a micro-AUC of 0.998 (Table 7). In PTSD classification, ResNet50 provided the best metrics, achieving an accuracy of 95.70%, a sensitivity of 94.67%, and a false positive rate of 4.30%.
Three CNN architectures were tested. These classified individuals with PTSD, depression, panic attacks, and healthy individuals. Different segment lengths were tried. ResNet50 yielded the best results. This model worked with 5-s segments. Overall accuracy was 96.65%. The MCC value was 0.96, and the micro-AUC value was 0.998. GoogLeNet showed different results. This model had an accuracy of 96.14% and an MCC value of 0.95. AlexNet performed worse, with an accuracy of 94.85% and an MCC value of 0.93. In PTSD classification, ResNet50 stood out. Accuracy was 95.70%, sensitivity was 94.67%, and the false positive rate was 4.30%. Clinically significant FOR values remained low in all models, ranging from 1.49% to 1.94%. This indicates a very low risk of missing PTSD cases. In all models, MCC values were above 0.92, indicating the strong correlation required for a psychiatric diagnosis. All models yielded the best results in 5-s segments. Performance decreased as segment duration increased. This finding is consistent with the acute cardiovascular features of PTSD attacks and supports the clinical significance of short ECG analysis windows.
3.2. Hybrid CNNs-SVM Multi-Class Classification
To further strengthen classification robustness, features extracted by CNN architectures were integrated with SVM classifiers following PCA dimensionality reduction. This hybrid approach combines the deep feature learning capacity of convolutional neural networks with the discriminative power of support vector machines, demonstrating superior performance in multi-class psychiatric disorder discrimination tasks.
SVM Configuration: Multi-class classification employed Error-Correcting Output Codes (ECOC) with linear kernel binary learners (MATLAB fitcecoc). Linear kernel was selected for its computational efficiency and appropriateness for PCA-transformed feature space. Features were standardized using Z-score normalization before PCA dimensionality reduction (95% variance retention), then classified via SVM.
The hybrid models consistently improved performance compared to CNN-only configurations across all segment lengths, with particularly pronounced advantages for shorter temporal segments. Notably, 5-s segments showed the most distinct class separations in PCA space, with each psychiatric condition forming well-defined clusters. As segment length increased to 20 s, class boundaries became increasingly diffuse, highlighting the critical impact of temporal resolution on feature extraction quality from ECG scalograms.
Figure 3 illustrates the PCA visualization of ResNet50 features across different ECG segment lengths, revealing the critical impact of temporal resolution on classification performance. In 5-s segments (a), the four psychiatric classes form remarkably distinct clusters in PCA space, with depression (blue) creating a compact cluster in the lower right, control group (red) concentrated in the lower left, panic attack (yellow) positioned in the middle right, and PTSD (purple) clearly separated in the upper left region. As segment length increases to 10 s (b), class separations remain preserved but cluster boundaries begin to soften slightly. At 15 s (c), inter-class distinctions show more pronounced degradation, particularly between control and depression groups, while PTSD maintains its characteristic position. In 20-s segments (d), inter-class overlaps reach maximum levels with all groups exhibiting more diffuse distributions, though general class tendencies are still preserved. This progressive degradation emphasizes that cardiovascular manifestations of psychiatric disorders require short-duration, high-resolution analysis for optimal discrimination.
For AlexNet + SVM, the best performance was obtained with 5 s segments, achieving 97.26% overall accuracy, 0.96 MCC, and a micro-AUC of 1.00 (Table 8). PTSD classification reached 95.95% precision and 96.91% recall, with a low FDR of 4.04%, highlighting the model’s ability to minimize false negatives. Performance gradually decreased with longer durations, with accuracy dropping to 90.97% and MCC to 0.88 at 20 s.
Similarly, GoogleNet + SVM achieved its peak performance at 5 s with 96.35% accuracy, 0.95 MCC, and 0.99 micro-AUC (Table 9). Precision and recall for PTSD were 94.20% and 94.83%, respectively. Despite robust results, a noticeable decline was observed for 15 s and 20 s, where accuracy fell to 91.33% and 90.31%, confirming the importance of short-segment ECG windows.
The ResNet50 + SVM hybrid achieved the best overall performance. With 5 s segments, it reached 97.05% accuracy, 0.97 MCC, and a nearly perfect micro-AUC of 1.00 (Table 10). PTSD classification yielded 95.98% precision and 95.50% recall, with the lowest FDR (1.29%) among all models. Although performance slightly declined at longer durations (overall accuracy 91.48% at 20 s), the hybrid ResNet50 maintained consistently higher MCC values compared to AlexNet and GoogleNet.
The confusion matrices in Figure 4 illustrate that ResNet50 + SVM provided the clearest class separation with 5 s segments, while misclassifications increased at 10–20 s. Corresponding ROC curves in Figure 5 confirmed near-perfect separability at 5 s, with only minor reductions at longer durations. In summary, the CNNs-SVM hybrids demonstrated superior stability and generalization over CNN-only models. Among them, ResNet50 + SVM emerged as the most reliable, combining strong precision–recall balance with minimal false discovery rates. These findings highlight the potential of hybrid frameworks as clinically meaningful diagnostic tools for psychiatric ECG analysis, especially in minimizing the risk of missing PTSD cases.
3.3. Statistical Analysis
3.3.1. Statistical Significance Analysis of Resnet + SVM vs. CNN Models
The statistical significance of the observed performance differences was evaluated in Table 11. Paired t-tests and Wilcoxon tests were applied to the 5-fold cross-validation results. When the effect of SVM integration was examined, a 0.49% improvement (p = 0.009) for ResNet50 and a 2.41% improvement (p = 0.007) for AlexNet were found to be statistically significant, while no significant improvement was observed for GoogLeNet (p = 0.548). In comparing the hybrid models, AlexNet + SVM achieved the highest accuracy (97.26%) and performed marginally better than ResNet50 + SVM (97.05%) (p = 0.037). The difference between ResNet50 + SVM and GoogLeNet + SVM was not statistically significant (p = 0.086). These results confirm that SVM integration provides real performance gains, especially for ResNet50 and AlexNet architectures.
3.3.2. McNemar’s Test for Prediction-Level Comparison on Hybrid CNN + SVM Models
To complement fold-level paired tests, we performed McNemar’s tests comparing prediction-level disagreements between models. Table 12 presents the results.
McNemar’s test revealed no significant difference between ResNet50 + SVM and AlexNet + SVM (p = 0.502), indicating statistically equivalent performance despite the 0.21% mean accuracy difference. Both models significantly outperformed GoogLeNet + SVM (p < 0.05).
3.4. Error Analysis and Misclassification Patterns
Error analysis was performed on hybrid CNN + SVM models with 5-s segments. These results are shown in Table 13. PTSD-Control confusion was most pronounced, while PTSD-Depression and PTSD-Panic confusions were minimal. AlexNet + SVM and Resnet50 + SVM showed the best overall discrimination. This confirms the specificity of ECG biomarkers in differentiating PTSD from other psychiatric disorders. Based on these results, PTSD is confused with the control group, and depression and panic attacks are also confused with each other.
3.5. Performance of Traditional Machine Learning Approaches
Table 14 presents the performance of conventional machine learning approaches using handcrafted statistical features extracted from ECG signals. These features include amplitude-based parameters (peak value, RMS, mean), variation measures (standard deviation, skewness, kurtosis), signal geometry descriptors (shape factor, crest factor, clearance factor, impulse factor), and signal quality metrics (SNR, SINAD, THD). All features were computed in MATLAB and selected based on their established effectiveness in biomedical signal processing for distinguishing pathological and healthy ECG patterns [35].
Traditional machine learning approaches achieved overall accuracies ranging from 32–44%, barely exceeding chance performance (25% for 4-class classification). The best-performing traditional method, Linear SVM, achieved 44.30% accuracy, while the worst-performing method, Neural Network, achieved only 31.65%.
The ROC analysis in Figure 6 clearly demonstrates the performance differences between classes. AUC values range from 0.405 to 0.791. The highest performance was obtained for the control class in the ensemble model with an AUC of 0.791. The lowest performance was observed for the PTSD class in the Three-Layer Neural Network with an AUC of 0.405. This wide performance range reflects the complexity of multiple classifications. These results demonstrate the necessity of deep learning for this complex multi-class psychiatric classification task, as our proposed hybrid CNN-SVM approach achieves approximately twice the accuracy compared to conventional methods.
4. Discussion
This study is based on ECG recordings from 79 participants in a single center. Collecting psychiatric ECG datasets is challenging, even for retrospective data, due to ethical constraints, expert requirements, and multicenter validation processes. Despite this limitation, the model demonstrated stable performance in layered cross-validation. To address potential data leakage concerns, we employed strict subject-wise cross-validation ensuring no participant’s segments appeared in both training and validation sets. However, our study lacks an independent test set, representing an inherent trade-off given our limited sample size. The reported performance reflects internal validation rather than true external generalization. Future studies will focus on multicenter external validation to establish clinical generalizability and assess model robustness across diverse populations.
Hyperparameters were tuned on one configuration (20 s-AlexNet) and applied to others. Nested cross-validation would provide more conservative estimates, but was computationally prohibitive. However, our best performance came from an untested configuration (5 s-ResNet50: 97.05% vs. 20 s-AlexNet: 91.48%), mitigating concerns about overfitting.
The second important aspect is longitudinal ECG monitoring. Repeated recordings or wearable measuring devices make it possible to assess temporal symptom dynamics and treatment response. This addresses the limitation of the single-session 5-min recordings used in this study. However, our approach partially compensates for this deficiency by dividing the 5-min ECG into shorter segments. These shorter windows capture more subtle autonomous changes and allow CNN models to extract deeper time-frequency features.
Explainability remains an open problem in deep learning. Current attribution methods do not provide fully reliable physiological interpretations for clinical decision-making. Future studies will integrate SHAP-based feature attribution and waveform-level analyses to better understand the cardiac dynamics driving model decisions. Recent studies have explored multimodal physiological fusion combining ECG with phonocardiogram (PCG), electrodermal activity (EDA), respiration, or accelerometry. The reference BSPC study [47] demonstrates that integrating complementary modalities increases feature diversity. However, these approaches require multiple synchronous physiological signals that are not routinely accessible in standard psychiatric clinics. Our study deliberately focused on a single-modality ECG framework to assess whether autonomous signatures alone carry sufficient discriminatory information for multiclass psychiatric discrimination. This design is consistent with the practical limitations of real-world psychiatric assessment, where ECG is often the only routine physiological signal collected. However, multimodal fusion represents a valuable future direction, particularly for capturing complementary autonomous and behavioral markers.
In terms of computational feasibility, the model was trained and tested on a standard CPU-based workstation (Intel Core i5, 8 GB RAM) without GPU acceleration. Despite the modest hardware, inference per scalogram required only tens of milliseconds, demonstrating suitability for real-time or near-real-time deployment in future devices. This system is not designed as a standalone diagnostic tool, but rather as a decision support tool that can complement existing scales such as PCL-5, HAM-D, and clinical interviews.
5. Conclusions
This study presents, for the first time, a hybrid CNN-SVM approach based on ECG signals for differentiating PTSD, depression, and panic attacks. The method yielded diagnostic accuracy exceeding 97%. ResNet50+SVM and AlexNet+SVM models showed similar performance. Scalogram representations enabled simultaneous time-frequency analysis. CNN architectures extracted complex physiological features that couıd be found manually. Five-second ECG segments were found to be optimal for capturing abrupt cardiovascular changes associated with psychiatric symptoms. Multiclass classification successfully differentiated PTSD from depression and panic attacks. Error analysis showed minimal confusion between PTSD and other psychiatric disorders. PTSD-Control interference was the main error pattern, confirming that these ECG biomarkers are objective diagnostic indicators. The hybrid CNN-SVM framework goes beyond existing binary classification approaches and offers a clinically ready diagnostic support system. This study lays a methodological foundation for future psychiatric disorder detection using cardiovascular biomarkers, enabling early intervention and reducing misdiagnosis rates.
6. Future Work
Future research will focus on enhancing the interpretability of the proposed hybrid CNN–SVM framework through explainable artificial intelligence (XAI) methods such as SHAP (SHapley Additive exPlanations) analysis. This will allow the identification of the most discriminative ECG-derived features contributing to PTSD, depression, and panic attack classification, providing physiological insight into disorder-specific cardiac dynamics. Additionally, extending the dataset to include larger and more diverse populations, as well as testing model generalizability across multi-center ECG databases, will further validate its clinical applicability. Integrating real-time analysis modules into portable ECG monitoring systems may also facilitate early detection and continuous psychiatric assessment in clinical settings. Longitudinal ECG datasets will also be incorporated in future research to enable monitoring of symptom evolution and clinical recovery trajectories. Future studies may also examine multimodal fusion (e.g., ECG + EDA or ECG + respiration), particularly in settings where additional physiological channels can be acquired reliably. Finally, future studies will explore nested cross-validation frameworks with separate inner and outer loops to enable more robust hyperparameter optimization and unbiased performance estimation, particularly in larger and more diverse datasets.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders 5th ed.Text Revision American Psychiatric Association Washington, DC, USA 2022
- 2Kessler R.C. Ratanatharathorn A. Ng L. Mc Laughlin K.A. Bromet E.J. Stein D.J. Karam E.G. Meron Ruscio A. Benjet C. Scott K. Posttraumatic stress disorder in the World Mental Health Surveys Psychol. Med.2017472260227410.1017/s 003329171700070828385165 PMC 6034513 · doi ↗ · pubmed ↗
- 3Merikangas K.R. He J.P. Burstein M. Swanson S.A. Avenevoli S. Cui L. Benjet C. Georgiades K. Swendsen J. Lifetime prevalence of mental disorders in U.S. adolescents: Results from the National Comorbidity Survey Replication–Adolescent Supplement (NCS-A)J. Am. Acad. Child Adolesc. Psychiatry 20104998098910.1016/j.jaac.2010.05.01720855043 PMC 2946114 · doi ↗ · pubmed ↗
- 4Davis L.L. Schein J. Cloutier M. Gagnon-Sanschagrin P. Maitland J. Urganus A. Guerin A. Lefebvre P. Houle C.R. The economic burden of posttraumatic stress disorder in the United States from a societal perspective J. Clin. Psychiatry 20228321 m 1411610.4088/JCP.21m 1411635485933 · doi ↗ · pubmed ↗
- 5Bothe T. Jacob J. Kröger C. Walker J. How expensive are post-traumatic stress disorders? Estimating incremental health care and economic costs on anonymised claims data Eur. J. Health Econ.20202191793010.1007/s 10198-020-01184-x 32458163 PMC 7366572 · doi ↗ · pubmed ↗
- 6Tunnell N.C. Corner S.E. Roque A.D. Kroll J.L. Ritz T. Meuret A.E. Biobehavioral approach to distinguishing panic symptoms from medical illness Front. Psychiatry 202415129656910.3389/fpsyt.2024.129656938779550 PMC 11109415 · doi ↗ · pubmed ↗
- 7Cackovic C. Nazir S. Marwaha R. Panic disorder Stat Pearls Stat Pearls Publishing Treasure Island, FL, USA 202328613692 · pubmed ↗
- 8Xu W. Yuan H. Wu X. Wang W. Comorbidity patterns of posttraumatic stress disorder and depression symptoms: Cross-validation in two postearthquake child and adolescent samples Depress. Anxiety 20231445366310.1155/2023/4453663 PMC 1192184540224591 · doi ↗ · pubmed ↗
