Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling

Zhijun Guo; Alvina Lai; Emmanouil Korakas; Aristeidis Vagenas; Irshad Ahamed; Christo Albor; Hengrui Zhang; Justin Healy; Kezhi Li

arXiv:2604.15124·cs.CL·April 17, 2026

Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling

Zhijun Guo, Alvina Lai, Emmanouil Korakas, Aristeidis Vagenas, Irshad Ahamed, Christo Albor, Hengrui Zhang, Justin Healy, Kezhi Li

PDF

TL;DR

This study evaluates a retrieval-grounded large language model as a supportive tool for diabetes counseling based on CGM data, showing it can generate empathetic, actionable responses with higher perceived quality than clinicians in a blinded assessment.

Contribution

It introduces a retrieval-grounded LLM-based conversational agent for CGM interpretation and demonstrates its potential as an adjunct in diabetes counseling with superior quality ratings.

Findings

01

The LLM system received higher quality scores than clinicians.

02

Safety concerns were rare and comparable between responses.

03

The system showed promise for patient education and preconsultation support.

Abstract

Continuous glucose monitoring (CGM) is central to diabetes care, but explaining CGM patterns clearly and empathetically remains time-intensive. Evidence for retrieval-grounded large language model (LLM) systems in CGM-informed counseling remains limited. To evaluate whether a retrieval-grounded LLM-based conversational agent (CA) could support patient understanding of CGM data and preparation for routine diabetes consultations. We developed a retrieval-grounded LLM-based CA for CGM interpretation and diabetes counseling support. The system generated plain-language responses while avoiding individualized therapeutic advice. Twelve CGM-informed cases were constructed from publicly available datasets. Between Oct 2025 and Feb 2026, 6 senior UK diabetes clinicians each reviewed 2 assigned cases and answered 24 questions. In a blinded multi-rater evaluation, each CA-generated and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.