# Comorbidity Classification from Clinical Free-Text using Large Language Models: Application to Sleep Disorder Patients

**Authors:** Yihan Deng, Fabio Dennstädt, Irina Filchenko, Julia van der Meer, Xiaoli Yang, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzovara, Kerstin Denecke

PMC · DOI: 10.1007/s10916-026-02343-y · Journal of Medical Systems · 2026-02-19

## TL;DR

This paper introduces a large language model-based method to accurately extract comorbidities from clinical free-text, improving on existing techniques for sleep disorder patients.

## Contribution

A novel LLM-based framework for comorbidity extraction with high accuracy and interpretability in clinical texts.

## Key findings

- The Mistral-24B model achieves 95% macro classification accuracy for comorbidities.
- The method supports transparent and hierarchical comorbidity extraction from diverse clinical texts.
- Performance is competitive with prior clinical NLP studies and complements transformer-based frameworks.

## Abstract

Patients presenting to neurology clinics commonly have a complex history of comorbidities and partially documented health trajectories, making it essential to reliably extract comorbidity information from historical records. However, existing extraction methods, ranging from rule-based systems to classical machine learning (ML), often have limited accuracy, scalability, or adaptability across diverse documents. We present a large language model (LLM)–based framework for comorbidity extraction from diagnostic texts, capable of handling various prompt formats and textual sources such as patient history, comorbidities, and sleep assessments. The instruction fine-tuned Mistral-24B (Instruct-2501) model achieves 95% macro classification accuracy and 92% F1 score across six common classes of comorbidities, achieving strong performance that is competitive with metrics reported in prior clinical phenotyping and information extraction studies, while complementing recent transformer-based clinical NLP frameworks. The proposed method extracts comorbidities through a transparent hierarchical approach, thereby supporting clinical analysis and providing interpretable insights for disease modeling and personalized treatment planning in sleep medicine.

## Linked entities

- **Diseases:** sleep disorder (MONDO:0003406)

## Full-text entities

- **Diseases:** -9-CM (MESH:C557826), cognitive impairment (MESH:D003072), weight loss (MESH:D015431), 10 (MESH:C557827), snoring (MESH:D012913), hypoxemia disorders (MESH:D000860), Mental Disorders (MESH:D001523), hypertensive heart/renal disease (MESH:D006977), Cardiovascular disease (MESH:D002318), insomnia (MESH:D007319), Atrial Fibrillation (MESH:D001281), Cerebral ischaemia (MESH:D002545), Sleep-disordered breathing (MESH:D012891), Diabetes (MESH:D003920), Ischemic stroke (MESH:D002544), Hypertension (MESH:D006973), -Wake (MESH:D012893), glaucoma (MESH:D005901), ICD-9-CM (MESH:D008310), Dyslipidemia (MESH:D050171), Cerebral ischaemic stroke (MESH:D020521), Central sleep apnoea (MESH:D020182), Type 1 diabetes mellitus (MESH:D003922), Sleep-related hypoventilation (MESH:D007040), Type 2 diabetes mellitus (MESH:D003924), OSA (MESH:D020181), Persistent AF (MESH:D000088562), Comorbidity (MESH:D004194), Transient ischaemic attack (MESH:D002546), LLM (MESH:D007806), MIMIC-IV (MESH:D006011), Epilepsy (MESH:D004827)
- **Chemicals:** lipid (MESH:D008055), Rombos-14B (-), PAP (MESH:D010724)
- **Species:** Homo sapiens (human, species) [taxon 9606], Liphistius sp. LM (species) [taxon 1285381]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12920282/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12920282/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12920282/full.md

---
Source: https://tomesphere.com/paper/PMC12920282