Evolving Interactive Diagnostic Agents in a Virtual Clinical Environment

Pengcheng Qiu; Chaoyi Wu; Junwei Liu; Qiaoyu Zheng; Yusheng Liao; Haowen Wang; Yun Yue; Qianrui Fan; Shuai Zhen; Jian Wang; Jinjie Gu; Yanfeng Wang; Ya Zhang; and Weidi Xie

arXiv:2510.24654·cs.CL·February 11, 2026

Evolving Interactive Diagnostic Agents in a Virtual Clinical Environment

Pengcheng Qiu, Chaoyi Wu, Junwei Liu, Qiaoyu Zheng, Yusheng Liao, Haowen Wang, Yun Yue, Qianrui Fan, Shuai Zhen, Jian Wang, Jinjie Gu, Yanfeng Wang, Ya Zhang, and Weidi Xie

PDF

4 Models 1 Datasets

TL;DR

This paper introduces a reinforcement learning framework for training large language models as interactive diagnostic agents in virtual clinical environments, improving multi-turn diagnostic accuracy and examination strategies.

Contribution

It presents DiagGym, DiagAgent, and DiagBench, enabling dynamic training, evaluation, and benchmarking of interactive diagnostic policies for the first time.

Findings

01

DiagAgent outperforms 11 SOTA LLMs and 2 prompt-engineered agents.

02

Achieves 11.20% higher diagnostic accuracy.

03

Boosts examination recommendation F1 score by 17.58%.

Abstract

We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning, enabling them to manage multi-turn interactive diagnostic processes, adaptively select examinations, and commit to final diagnoses. Unlike instruction-tuned models trained on static data, our method acquires diagnostic strategies through dynamic exploration and outcome-based feedback, mapping evolving patient states to the next optimal examination and subsequent diagnosis. Our contributions include: (i) DiagGym, a diagnostics world model trained with electronic health records, serving as a virtual clinical environment to support closed-loop in-silico training and evaluation for interactive diagnosis; (ii) DiagAgent, trained via end-to-end multi-turn RL to learn dynamic diagnostic policies that optimize both interactive effectiveness and final accuracy; (iii) DiagBench, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Henrychur/DiagBench
dataset· 41 dl
41 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.