# AI-assisted screening for mild cognitive impairment using routine EHR data: a Gradient Boosting approach

**Authors:** Tao Ye, Jianghua Peng

PMC · DOI: 10.3389/fneur.2026.1718791 · Frontiers in Neurology · 2026-02-17

## TL;DR

This study developed a machine learning model using routine EHR data to identify older patients with mild cognitive impairment, showing promising accuracy and potential for low-cost screening in primary care.

## Contribution

The novel contribution is the application of Gradient Boosting with EHR data for MCI screening, achieving strong discrimination and calibration in a Chinese outpatient population.

## Key findings

- Gradient Boosting achieved a test AUC of 0.850 and accuracy of 0.833 for MCI detection.
- Important predictors included age, sex, education, family size, and depression scores.
- The model demonstrated good calibration and potential for automated screening, though external validation is needed.

## Abstract

To develop and internally validate a machine learning (ML) model that identifies older outpatients with MCI using routine electronic health record (EHR) data.

We conducted a retrospective cross-sectional study of community outpatients aged ≥60 years in Zhejiang, China. Structured EHR predictors included demographics, comorbidities/medications, lifestyle, and visit patterns. The outcome was adjudicated MCI based on cognitive screening (MoCA plus supplemental tests). Supervised ML classifiers were compared using 10-fold cross-validation and an independent held-out test set; class imbalance was addressed with SMOTE. Performance was assessed by the area under the ROC curve (AUC) and by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score.

The test set included ~640 patients (≈20% MCI). Gradient Boosting performed best: cross-validation mean AUC 0.855 (SD 0.031) and accuracy 0.862 (SD 0.013); test AUC 0.850, accuracy 0.833, and F1 0.402. At the default threshold, sensitivity was 0.286 and specificity 0.967 (PPV 0.679; NPV 0.847). Prioritizing sensitivity (~0.82) lowered specificity (~0.64). At a high-sensitivity threshold of 0.159, the model achieved a sensitivity of 0.802 with a specificity of 0.751 (PPV 0.441; NPV 0.939). Important predictors included older age, female sex, lower education, smaller family size, and higher depression scores.

An ML model using routine outpatient EHR can discriminate MCI in older adults (AUC ≈ 0.85), supporting potential for automated, low-cost screening in primary care. Using the predicted probabilities generated in this analysis, we assessed calibration and conducted a decision-curve analysis. While the model shows good discrimination and calibration, external validation is still required to confirm clinical utility and refine operating thresholds.

## Linked entities

- **Diseases:** depression (MONDO:0002050)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** diabetes (MESH:D003920), MCI (MESH:D060825), ADRD (MESH:D000544), anxiety (MESH:D001007), conditions (MESH:D020763), hearing loss (MESH:D034381), white-matter hyperintensity (MESH:D056784), AI (MESH:C538142), stroke (MESH:D020521), ML (MESH:D007859), cerebrovascular disease (MESH:D002561), hypertension (MESH:D006973), cognitive complaint (MESH:D003072), Depressive symptoms (MESH:D003866), transient ischemic attack (MESH:D002546), dementia (MESH:D003704)
- **Chemicals:** alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12954454/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12954454/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12954454/full.md

---
Source: https://tomesphere.com/paper/PMC12954454