# Early prediction of Alzheimer’s disease using longitudinal electronic health records of US military veterans

**Authors:** Rumeng Li, Dan Berlowitz, Jesse Mez, Brian Silver, Xun Wang, Wen Hu, Raelene Goodwin, Heather Keating, Weisong Liu, Honghuang Lin, Hong Yu

PMC · DOI: 10.1038/s43856-025-01206-w · Communications Medicine · 2026-01-12

## TL;DR

This study shows that early signs of Alzheimer’s disease can be detected years before diagnosis by analyzing keywords in medical records, helping identify high-risk individuals.

## Contribution

The novel contribution is demonstrating that longitudinal clinical notes contain predictive signals for Alzheimer’s disease years before diagnosis, using keyword frequency analysis and machine learning.

## Key findings

- Alzheimer’s-related keywords appear more frequently in patients who later develop the disease, with exponential growth in mentions before diagnosis.
- A keyword-based random forest model achieves an AUC of 0.861 one day before diagnosis and 0.577 ten years before, outperforming structured data models.
- The predictive patterns are consistent across demographic subgroups, suggesting broad applicability of the approach.

## Abstract

Early prediction of Alzheimer’s disease is important for timely intervention and treatment. We examine whether machine learning on longitudinal electronic health record notes can improve early prediction of Alzheimer’s disease.

From Veterans Health Administration records (2000 to 2022), we studied 61,537 individuals diagnosed with Alzheimer’s disease and 234,105 without, aged 45–103 years, 98.4% were male. From clinical notes, we quantified the frequency of subjective cognitive decline and Alzheimer’s disease-related keywords, and applied statistical machine learning models to assess their ability to predict future diagnosis.

Here we show that Alzheimer’s-related keywords (e.g., “concentration,” “speaking”), occur more often in notes of individuals who later develop Alzheimer’s disease than in controls. In the 15 years preceding diagnosis, cases demonstrate an exponential increase in keyword mentions (from 9.4 to 57.7 per year), whereas controls show a slower, linear increase (8.2 to 20.3). These trends are consistent across demographic subgroups. Random forest models using these keywords for prediction achieve an area under receiver operating characteristic curve from 0.577 at ten years before diagnosis to 0.861 one day before diagnosis, consistently outperforming models using only structured data.

Signs and symptoms of early Alzheimer’s disease are reported in clinical notes many years before a clinical diagnosis is made and the frequency of these signs and symptoms, approximated by keywords, increases the closer one is to the diagnosis. A simple keyword-based approach can capture these signals and can help identify individuals at high risk of future Alzheimer’s disease.

This study explored whether early signs of Alzheimer’s disease could be detected in routine medical records. We analyzed the health records of over 295,000 people from the U.S. Veterans Health Administration. We focused on words in doctors’ notes that reflect a wide range of early symptoms, including changes in memory, speech, cognition, mood, physical functioning, and daily activity needs. These signs appeared more frequently and increased more rapidly in people who were later diagnosed with Alzheimer’s. A computer model built on these words was able to predict who might develop the disease years in advance. These findings suggest that ordinary clinical notes could help doctors notice early warning signs of Alzheimer’s and support earlier care and planning.

Li et al. apply machine learning to longitudinal clinical notes to improve prediction of Alzheimer’s disease. They find that Alzheimer’s-related keywords occur more often in patients who later develop the disease, rising sharply before diagnosis and helping identify high-risk individuals.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Genes:** MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}, CP (ceruloplasmin) [NCBI Gene 1356] {aka AB073614, CP-2}
- **Diseases:** Pain (MESH:D010146), skin lesion (MESH:D012871), anxiety (MESH:D001007), mood dysphoric (MESH:C565864), MCI (MESH:D060825), incontinence (MESH:D014549), itching (MESH:D011537), eczema (MESH:D004485), memory lapses (MESH:D008569), depression (MESH:D003866), cellulitis (MESH:D002481), AD (MESH:D000544), SCD (MESH:D003072), CP (MESH:D002972), sleep disturbances (MESH:D012893), dementia (MESH:D003704), -II (MESH:C537730), ulcer (MESH:D014456), Poor visual memory (MESH:D014786), rash (MESH:D005076), I (MESH:D006969), traumatic brain injury (MESH:D000070642), concentration difficulties (MESH:C567712), neuropsychiatric symptoms (MESH:D001523), delusion (MESH:D063726), neurodegenerative disorder (MESH:D019636), post-traumatic stress disorder (MESH:D013313)
- **Chemicals:** CP (-), donepezil (MESH:D000077265)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12796311/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12796311/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/PMC12796311/full.md

---
Source: https://tomesphere.com/paper/PMC12796311