Aligning Medical Conversational AI through Online Reinforcement Learning with Information-Theoretic Rewards

Tanvi Verma; Yang Zhou; Rick Siow Mong Goh; Yong Liu

arXiv:2601.17828·cs.AI·January 27, 2026

Aligning Medical Conversational AI through Online Reinforcement Learning with Information-Theoretic Rewards

Tanvi Verma, Yang Zhou, Rick Siow Mong Goh, Yong Liu

PDF

Open Access

TL;DR

This paper introduces IGFT, an online reinforcement learning method with information-theoretic rewards, to train medical conversational AI models that effectively conduct patient interviews without relying on pre-existing annotated conversations.

Contribution

The paper presents a novel online RL framework with information gain rewards for training medical dialogue models, enabling them to learn effective questioning strategies through self-exploration.

Findings

01

IGFT improves F1 scores on Avey and MIMIC datasets by over 10%.

02

Models outperform existing medical QA systems on multi-turn conversations.

03

The approach enables effective training without expensive annotated data.

Abstract

We present Information Gain Fine-Tuning (IGFT), a novel approach for training medical conversational AI to conduct effective patient interviews and generate comprehensive History of Present Illness (HPI) without requiring pre-collected human conversations. IGFT combines online Group Relative Policy Optimization (GRPO) with information-theoretic rewards, enabling models to learn from self-generated conversations with simulated patients. Unlike existing approaches that rely on expensive expert-annotated conversations or static datasets, our online RL framework allows models to discover effective questioning strategies through exploration. Our key innovation is an information gain reward function that tracks which clinical entities such as symptoms, temporal patterns, and medical history, are revealed during conversation. Each question's reward is computed based on its expected information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling