Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Hebin Hu; Renke Dai; Ah-Hwee Tan; Yilin Kang

arXiv:2605.19766·cs.CL·May 20, 2026

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Hebin Hu, Renke Dai, Ah-Hwee Tan, Yilin Kang

PDF

TL;DR

This paper presents a framework for synthesizing and evaluating long-term, history-aware medical dialogues using LLMs, addressing the lack of realistic datasets for healthcare agent development.

Contribution

It introduces MediLongChat, a high-quality synthetic dataset with benchmark tasks and a comprehensive evaluation framework for healthcare dialogue memory capabilities.

Findings

01

State-of-the-art LLMs struggle with MediLongChat.

02

The dataset enables systematic evaluation of long-term medical dialogue reasoning.

03

Multi-dimensional metrics effectively assess data quality and model performance.

Abstract

An effective healthcare agent must be able to recall and reason over a patient's longitudinal medical history. However, the absence of datasets with realistic long-term dialogue timelines limits systematic evaluation. Real clinical text is constrained by privacy and ethics, while existing benchmarks focus on isolated interactions, failing to capture cross-session reasoning. We introduce a framework for synthesizing high-quality, long-term medical dialogues with LLMs. Our approach entails a knowledge-guided decomposition into three stages: constructing synthetic patient profiles with diverse disease and complication trajectories, generating multi-turn dialogues per encounter, and integrating them into a coherent longitudinal history dataset, MediLongChat. We establish three benchmark tasks-In-dialogue Reasoning, Cross-dialogue Reasoning, and Synthesis Reasoning-to evaluate the memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.