LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis
Favour Yahdii Aghaebe, Tanefa Apekey, Elizabeth Williams, Nafise Sadat Moosavi

TL;DR
This study assesses whether state-of-the-art language models accurately retain age-related information in biomedical summaries, revealing disparities and biases that highlight the need for fairness-aware evaluation in biomedical NLP.
Contribution
We introduce DemogSummary, a novel age-stratified dataset, and evaluate LLMs' ability to preserve demographic information, highlighting systematic biases and limitations in current models.
Findings
Demographic fidelity is lowest for adult-focused summaries.
Under-represented populations are more prone to hallucinations.
Current LLMs show systematic disparities across age groups.
Abstract
Clinical interventions often hinge on age: medications and procedures safe for adults may be harmful to children or ineffective for older adults. However, as language models are increasingly integrated into biomedical evidence synthesis workflows, it remains uncertain whether these systems preserve such crucial demographic distinctions. To address this gap, we evaluate how well state-of-the-art language models retain age-related information when generating abstractive summaries of biomedical studies. We construct DemogSummary, a novel age-stratified dataset of systematic review primary studies, covering child, adult, and older adult populations. We evaluate three prominent summarisation-capable LLMs, Qwen (open-source), Longformer (open-source) and GPT-4.1 Nano (proprietary), using both standard metrics and a newly proposed Demographic Salience Score (DSS), which quantifies age-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeta-analysis and systematic reviews · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
