# LUMEN: Prototype Conversational AI to Streamline Dementia Assessments

**Authors:** Song Ling Tang, Alexander Robertson, Huizhi Liang, John-Paul Taylor, Judith Harrison

PMC · DOI: 10.1192/bjo.2025.10185 · 2025-06-20

## TL;DR

LUMEN is a conversational AI tool designed to help streamline dementia assessments by collecting caregiver input before appointments, potentially improving diagnostic accuracy and reducing clinician workload.

## Contribution

LUMEN introduces a prototype conversational AI for dementia assessments, integrating stakeholder input and open-source models to standardize and improve diagnostic accuracy.

## Key findings

- LUMEN showed strong performance in distinguishing dementia from normal cognition (AUROC=0.89).
- Agreement between LUMEN and clinician evaluations was substantial (Cohen’s κ=0.82).
- System usability was rated as excellent (mean SUS score of 82/100).

## Abstract

Aims: 
Dementia assessments are time-intensive and often distressing for patients and caregivers. Underdiagnosis of non-Alzheimer’s disease subtypes remains prevalent. This study aimed to develop and evaluate LUMEN (Large Language Model for Understanding and Monitoring Elderly Neurocognition), a prototype conversational AI to automate caregiver-collateral data collection before clinical appointments. Our goals were to reduce clinician time per assessment, improve diagnostic accuracy across dementia subtypes, and standardise caregiver assessments.

Methods: 
LUMEN’s development integrated a Patient, Public, and Professional Involvement (PPPI) process, incorporating stakeholder workshops, a modified Delphi process with 130 clinicians, and iterative consultations to identify key diagnostic priorities, such as functional impairments, safety concerns, and inclusivity. Four open-source 7B-parameter large language models (LLMs) – Mistral, Llama2, Zephyr, and Phi2 – were evaluated for efficiency (token count), readability (Flesch Reading Ease), and contextual relevance (cosine similarity to clinical dialogues). Mistral:7B was selected and fine-tuned using automated hyperparameter adjustments (GridSearchCV), advanced prompt engineering (chain-of-thought, flipped classroom techniques), and BLEU-scored linguistic refinement. A prototype interface was tested using 16 clinician-simulated caregiver dialogues derived from case vignettes spanning dementia subtypes and normal cognition. LUMEN’s diagnostic outputs were compared with clinician-derived diagnoses using the Area Under the Receiver Operating Characteristic (AUROC) curve and agreement measured via Cohen’s kappa. Usability was assessed via the System Usability Scale (SUS).

Results: 
LUMEN demonstrated strong performance in distinguishing dementia from normal cognition (AUROC=0.89) but moderate subtype differentiation (AUROC=0.66). Agreement between LUMEN and clinician evaluations was substantial (Cohen’s κ=0.82). However, Lewy body dementia (DLB) identification lagged due to symptom-reporting inaccuracies. System Usability Scale (SUS) scores (mean=82/100) exceeded the ‘excellent’ threshold (≥80). PPPI feedback highlighted LUMEN’s potential to standardise assessment and reduce waiting times.

Conclusion: LUMEN is a promising conversational AI tool for improving dementia diagnostics. Gathering caregivers’ collateral input before appointments could streamline workflows within existing outpatient systems and improve clinical accuracy. Real-world trials would help assess workflow integration and mitigate vignette-based biases from simulated testing, such as the overrepresentation of typical phenotypes.

This study was conducted in collaboration with Mr Bede Burston, Dr Elizabeth Robertson, and Dr Donncha Mullin, whose contributions were invaluable.

## Linked entities

- **Diseases:** dementia (MONDO:0001627), Alzheimer’s disease (MONDO:0004975), Lewy body dementia (MONDO:0007488), DLB (MONDO:0007488)

---
Source: https://tomesphere.com/paper/PMC12242181