# Interpretable machine learning for cognitive impairment prediction in Parkinson’s disease: a multicenter validation study with SHAP analysis

**Authors:** Ziyuan Wang, Junqiang Yan

PMC · DOI: 10.3389/fnagi.2025.1688653 · Frontiers in Aging Neuroscience · 2025-11-11

## TL;DR

This study develops an interpretable machine learning model to predict cognitive impairment in Parkinson’s disease using routine clinical data, validated across multiple populations.

## Contribution

A novel interpretable machine learning framework for PD-CI detection using accessible clinical features and validated across diverse populations.

## Key findings

- Random Forest achieved best performance (AUC = 0.83) in predicting PD-CI using routine clinical data.
- SHAP analysis identified age, NLR, and serum uric acid as key predictors of PD-CI.
- The model showed 71.57% accuracy in external validation and highlighted inflammation and oxidative stress as key drivers.

## Abstract

Parkinson’s disease (PD)-related cognitive impairment (PD-CI) is a common and impactful complication of PD, yet current predictive models often rely on specialized resources, lack interpretability, or have limited cross-population validation. This study aimed to develop an interpretable machine learning framework for PD-CI detection using only routine clinical data, addressing unmet needs in accessible and generalizable PD care.

We analyzed 1,279 participants from the Parkinson’s Progression Markers Initiative (PPMI) as the discovery cohort and 197 patients from an independent validation cohort. PD-CI was defined by a Montreal Cognitive Assessment (MoCA) score ≤26 and Unified Parkinson’s Disease Rating Scale Part I (UPDRS-I) score ≥1. Twenty-one clinical features—encompassing hematological parameters, metabolic markers, and demographics—were preprocessed with synthetic minority over-sampling. Four machine learning models were trained and optimized via nested 5-fold cross-validation.

The Random Forest algorithm achieved superior performance in the discovery cohort (AUC = 0.83), outperforming CatBoost (AUC = 0.82), XGBoost (AUC = 0.79), and neural networks (AUC = 0.66). External validation of the framework preserved 71.57% accuracy. SHAP interpretability analysis identified age, neutrophil-to-lymphocyte ratio (NLR), and serum uric acid as critical predictors, revealing synergistic risk effects between elevated inflammation markers and reduced antioxidant levels.

This framework demonstrates diagnostic accuracy comparable to advanced neuroimaging while utilizing readily available clinical data, enhancing accessibility in resource-limited settings. It highlights neuroinflammation and oxidative stress as key mechanistic drivers of PD-CI, advancing pathophysiological understanding. Multicenter validation confirms the model’s robustness across ethnic populations, supporting its utility as a clinically actionable tool for PD-CI screening and monitoring.

## Linked entities

- **Diseases:** Parkinson’s disease (MONDO:0005180)

## Full-text entities

- **Diseases:** PD (MESH:D010300), inflammation (MESH:D007249), neuroinflammation (MESH:D000090862), cognitive impairment (MESH:D003072)
- **Chemicals:** uric acid (MESH:D014527)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12643985/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12643985/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12643985/full.md

---
Source: https://tomesphere.com/paper/PMC12643985