# Machine learning-based stratification of mild cognitive impairment in Parkinson’s disease: a multicenter cross-sectional analysis

**Authors:** Yanfang Liu, Meiling Chen, Peng Chen, Xiaohui Lin, Sangsang Chen, Chaoning Liu, Donghui Wang, Hongxing Deng, Qinghua Li, Yuan Wu

PMC · DOI: 10.1186/s12911-025-03215-0 · 2025-10-15

## TL;DR

This study creates a machine learning tool to identify Parkinson’s patients at risk of mild cognitive impairment using routine clinical data, helping doctors prioritize those needing detailed cognitive tests.

## Contribution

A clinic-ready, externally validated machine learning model for PD-MCI risk stratification using MoCA-based labels and routinely collected variables.

## Key findings

- Logistic regression showed balanced performance with AUCs of 0.789 (training), 0.778 (internal), and 0.772 (external).
- Education and motor severity were the strongest predictors of PD-MCI risk, followed by sex and age at disease onset.
- The tool prioritizes MoCA-normal patients for further neuropsychological evaluation and closer monitoring.

## Abstract

Cognitive impairment is a prominent non-motor manifestation of Parkinson’s disease (PD) and is associated with reduced quality of life, increased mortality, and higher healthcare utilization. We aimed to develop and externally validate a machine-learning model, trained on Montreal Cognitive Assessment (MoCA)—based Movement Disorder Society (MDS) Level I labels, that estimates the contemporaneous probability of mild cognitive impairment in PD (PD-MCI) from routinely collected clinical variables, enabling clinicians to prioritize MoCA-normal patients with higher model-estimated probability for MDS Level II neuropsychological evaluation and closer follow-up.

We analyzed 799 participants with PD from the Parkinson’s Progression Markers Initiative (PPMI), randomly assigning them to training (n = 559) and internal validation (n = 240) cohorts. An independent external cohort comprised 70 consecutive patients recruited at The Affiliated Hospital of Guilin Medical University between February 2024 and March 2025. The reference outcome was MoCA-based PD-MCI (21–25) versus cognitively normal PD (26–30). Candidate predictors were screened by LASSO (1-SE criterion). To handle class imbalance, SMOTE was applied only during model fitting; both validation cohorts retained native class distributions. Five machine-learning models (logistic regression [LR], support vector machine, XGBoost, neural network, LightGBM) were evaluated on non-resampled data for discrimination (area under the receiver operating characteristic curve, AUC), calibration, and clinical utility (decision-curve analysis, DCA). Interpretability combined a nomogram with Shapley additive explanations (SHAP); a bilingual web calculator was also implemented.

Of 799 PPMI participants, 169 (21.2%) met the MoCA-based PD-MCI definition. Seven routinely collected predictors were retained (sex, age, education, age at disease onset, MDS-UPDRS Part III, GDS, UPSIT). LR showed the most balanced performance: AUC 0.789 (training), 0.778 (internal), and 0.772 (external). At a fixed threshold of 0.50 in the external cohort, LR’s sensitivity was 89.7%, specificity 43.9%, and F1-score 66.7%. Calibration and DCA favored LR. SHAP indicated education and motor severity as dominant contributors, followed by sex and age at onset; depressive burden (GDS) and hyposmia (UPSIT) increased risk, whereas chronological age had a smaller marginal effect.

We developed and externally validated a probability-based, clinic-ready risk-stratification tool for PD-MCI using routinely available variables and MoCA-based MDS Level I labels. Implemented as a nomogram and bilingual calculator, it supports sensitivity-oriented triage—especially among MoCA-normal patients—by prioritizing timely MDS Level II evaluation and closer follow-up. The tool complements, rather than replaces, formal diagnostic assessment and does not predict long-term conversion.

Not applicable. The PPMI study is registered with ClinicalTrials.gov (NCT01141023) and the registration date is June 8, 2010.

The online version contains supplementary material available at 10.1186/s12911-025-03215-0.

## Linked entities

- **Diseases:** Parkinson’s disease (MONDO:0005180)

## Full-text entities

- **Diseases:** PD (MESH:D010300), depressive burden (MESH:D003866), Movement Disorder (MESH:D009069), hyposmia (MESH:D000086582), Cognitive impairment (MESH:D003072)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12522980/full.md

---
Source: https://tomesphere.com/paper/PMC12522980