# An explainable ensemble machine learning model using baseline blood transcriptomics to predict Parkinson's disease motor progression

**Authors:** Yelda Fırat

PMC · DOI: 10.3389/fdgth.2026.1774436 · Frontiers in Digital Health · 2026-02-18

## TL;DR

This study uses blood transcriptomics and machine learning to predict how Parkinson's disease will progress over 12 months, identifying key genes and biological pathways involved.

## Contribution

The novel contribution is an explainable machine learning model combining transcriptomic data and clinical features to predict Parkinson's progression and identify key genetic and pathway-level predictors.

## Key findings

- The model achieved an R² of 0.551 and MAE of 6.01 in predicting motor progression.
- The baseline UPDRS × PINK1 interaction was the most influential feature in the model.
- Mitochondrial dysfunction was identified as the dominant pathway contributing to disease progression.

## Abstract

Predicting Parkinson's disease (PD) motor progression remains challenging despite advances in neuroimaging. Blood-based transcriptomic profiling offers a more accessible and cost-effective alternative. This study aimed to develop and validate a machine learning approach using blood-based transcriptomic data to predict 12-month motor severity in PD and to identify the transcriptomic features and biological pathways most strongly associated with progression.

A Stacking Regressor ensemble model combining three gradient boosting algorithms (XGBoost, LightGBM, CatBoost) was developed using baseline Parkinson's Progression Markers Initiative (PPMI) data (n = 390), integrating blood RNA sequencing (RNA-seq) and clinical features to predict 12-month UPDRS Part III scores. SHapley Additive exPlanations (SHAP) analysis was applied to identify key prognostic features, evaluating seven PD risk genes (SNCA, LRRK2, GBA, PRKN, PINK1, PARK7, VPS35) and pathway scores for mitochondrial dysfunction, neuroinflammation, and autophagy.

On an independent test set (n = 78), the model achieved a Coefficient of Determination (R²) of 0.551 and Mean Absolute Error (MAE) of 6.01. SHAP analysis identified the baseline UPDRS × PINK1 interaction (UPDRS_BL × PINK1) as the most influential feature (mean |SHAP| = 0.283). Among transcriptomic features, VPS35 (mean |SHAP| = 0.010), GBA, and LRRK2 were most prominent. Mitochondrial dysfunction showed the highest pathway contribution (mean |SHAP| = 0.008).

The study establishes that machine learning integrating blood transcriptomics and clinical data effectively predicts motor progression in PD. Crucially, the interplay between initial clinical state and specific genetic backgrounds-particularly PINK1-is a more powerful prognostic indicator than any factor alone. This study provides systematic evidence that mitochondrial dysfunction is a dominant prognostic signal for disease progression, nominating key genes and pathways for future mechanistic and therapeutic investigation.

## Linked entities

- **Genes:** SNCA (synuclein alpha) [NCBI Gene 6622], LRRK2 (leucine rich repeat kinase 2) [NCBI Gene 120892], GBA1 (glucosylceramidase beta 1) [NCBI Gene 2629], PRKN (parkin RBR E3 ubiquitin protein ligase) [NCBI Gene 5071], PINK1 (PTEN induced kinase 1) [NCBI Gene 65018], PARK7 (Parkinsonism associated deglycase) [NCBI Gene 11315], VPS35 (VPS35 retromer complex component) [NCBI Gene 55737]
- **Diseases:** Parkinson's disease (MONDO:0005180)

## Full-text entities

- **Genes:** ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, GBA1 (glucosylceramidase beta 1) [NCBI Gene 2629] {aka GBA, GCB, GLUC}, PRKN (parkin RBR E3 ubiquitin protein ligase) [NCBI Gene 5071] {aka AR-JP, LPRS2, PARK2, PDJ}, PINK1 (PTEN induced kinase 1) [NCBI Gene 65018] {aka BRPK, PARK6}, SNCA (synuclein alpha) [NCBI Gene 6622] {aka NACP, PARK1, PARK4, PD1}, PARK7 (Parkinsonism associated deglycase) [NCBI Gene 11315] {aka DJ-1, DJ1, GATD2, HEL-S-67p}, VPS35 (VPS35 retromer complex component) [NCBI Gene 55737] {aka MEM3, PARK17}, LRRK2 (leucine rich repeat kinase 2) [NCBI Gene 120892] {aka AURA17, DARDARIN, PARK8, RIPK7, ROCO2}
- **Diseases:** Mitochondrial dysfunction (MESH:D028361), PD (MESH:D010300), disease (MESH:D004194), neurodegenerative disease (MESH:D019636), neuroinflammation (MESH:D000090862), AI (MESH:C538142), XAI (MESH:C538243), motor and cognitive decline (MESH:D003072), dopaminergic (MESH:D009422)
- **Chemicals:** coenzyme Q10 (MESH:C024989), levodopa (MESH:D007980), creatine (MESH:D003401)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** DELTA

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12957191/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12957191/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12957191/full.md

---
Source: https://tomesphere.com/paper/PMC12957191