LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes
Mingchen Shao, Yuzhang Xie, Carl Yang, Jiaying Lu

TL;DR
This paper introduces LLM-MINE, a framework utilizing large language models to automatically extract Alzheimer's and dementia-related phenotypes from unstructured clinical notes, improving disease staging and cohort analysis.
Contribution
The paper presents a novel LLM-based approach for extracting ADRD phenotypes from clinical notes, outperforming traditional methods like NER and dictionary-based baselines.
Findings
Chi-square confirms significant phenotype differences across cohorts.
Few-shot prompting yields the best clustering performance (ARI=0.290, NMI=0.232).
LLM-MINE effectively discovers meaningful ADRD signals from unstructured notes.
Abstract
Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic health records (EHR) is critical for early-stage detection and disease staging. However, this information is usually embedded in unstructured textual data rather than tabular data, making it difficult to be extracted accurately. We therefore propose LLM-MINE, a Large Language Model-based phenotype mining framework for automatic extraction of ADRD phenotypes from clinical notes. Using two expert-defined phenotype lists, we evaluate the extracted phenotypes by examining their statistical significance across cohorts and their utility for unsupervised disease staging. Chi-square analyses confirm statistically significant phenotype differences across cohorts, with memory impairment being the strongest discriminator. Few-shot prompting with the combined phenotype lists achieves the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Dementia and Cognitive Impairment Research
