# Dynamic context-aware multi-modal deep learning for longitudinal prediction of Parkinson’s disease progression

**Authors:** Amin Dehghanghanatkaman

PMC · DOI: 10.1038/s41598-025-31898-y · Scientific Reports · 2025-12-11

## TL;DR

A new deep learning model accurately predicts how Parkinson’s disease will progress over time by combining voice data, clinical features, and patient summaries.

## Contribution

A dynamic context-aware multi-modal deep learning framework that integrates voice biomarkers, clinical features, and NLP-derived embeddings for longitudinal PD progression prediction.

## Key findings

- The model achieves exceptional performance with R² = 0.9925 ± 0.0027 and outperforms classical ML baselines significantly.
- Text embeddings provide the largest incremental gain in prediction accuracy (3.82% RMSE reduction).
- Voice biomarkers modestly improve accuracy but greatly enhance prediction stability.

## Abstract

Accurately forecasting the progression of Parkinson’s disease (PD) motor symptoms in early-to-moderate stages is essential for timely intervention and personalized patient care but remains challenging due to heterogeneous and longitudinal symptom evolution. We present a novel dynamic context-aware multi-modal deep learning framework that predicts future motor symptom severity by integrating advanced voice biomarkers with signal processing techniques, clinical progression features, demographic metadata, and semantically enriched patient summary embeddings derived from comprehensive clinical narratives via state-of-the-art natural language processing. Leveraging bidirectional LSTMs augmented with multi-head self-attention, our architecture captures complex temporal dependencies while preventing information leakage. To ensure robust evaluation despite limited sample size (42 patients), we implemented repeated 5-fold cross-validation at the patient level (8 repetitions, 40 total folds), substantially exceeding standard evaluation rigor. Our approach achieves exceptional performance (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\hbox {R}^2$$\end{document} = 0.9925 ± 0.0027, RMSE = 0.67 ± 0.19, MAE = 0.50 ± 0.15) with all 40 folds achieving \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\hbox {R}^2$$\end{document} > 0.989, significantly outperforming classical machine learning baselines (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$p < 1 \times 10^{-5}$$\end{document} and 0.002785) and all previously published methods on this dataset. Cross-validated ablation studies (240 total model trainings across 6 configurations) reveal that clinical features establish a strong baseline (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\hbox {R}^2$$\end{document} = 0.9887 ± 0.0043), while text embeddings provide the largest incremental gain (3.82% RMSE reduction). Voice biomarkers contribute modestly to accuracy (2.72%) but substantially enhance stability (10-fold lower variability). The full multi-modal model achieves optimal performance (7.50% RMSE reduction vs. clinical-only) with the lowest variability (CV = 0.27%), demonstrating that dynamic cross-modal fusion enhances both accuracy and robustness. These findings, validated through 40 independent evaluations with each patient tested 8 times, demonstrate that integrating engineered temporal dynamics and contextual embeddings through advanced temporal modeling enables accurate longitudinal predictions of early-to-moderate PD progression. Complete code and implementation details are publicly available to ensure reproducibility.

## Linked entities

- **Diseases:** Parkinson’s disease (MONDO:0005180)

## Full-text entities

- **Diseases:** PD (MESH:D010300)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12808133/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12808133/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12808133/full.md

---
Source: https://tomesphere.com/paper/PMC12808133