Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang; Zhenbang Wu; Gururaj Kolar; Hariprasad Korsapati; Brian Bartlett; Bryan Hull; Jimeng Sun

arXiv:2505.21908·cs.LG·October 16, 2025

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar, Hariprasad Korsapati, Brian Bartlett, Bryan Hull, Jimeng Sun

PDF

Open Access

TL;DR

This paper presents DRG-Sapphire, a reinforcement learning-based model that improves out-of-distribution diagnosis coding in clinical notes, achieving state-of-the-art accuracy and better explainability, while analyzing RL's limitations in knowledge-intensive tasks.

Contribution

Introduces DRG-Sapphire, a novel RL approach with rule-based rewards for DRG coding, demonstrating improved accuracy and explainability in OOD clinical tasks.

Findings

01

RL performance scales with the logarithm of fine-tuning examples

02

Strong RL performance requires sufficient domain knowledge in the base model

03

Scaling supervised fine-tuning is more effective than scaling RL alone

Abstract

Diagnosis-Related Group (DRG) codes are essential for hospital reimbursement and operations but require labor-intensive assignment. Large Language Models (LLMs) struggle with DRG coding due to the out-of-distribution (OOD) nature of the task: pretraining corpora rarely contain private clinical or billing data. We introduce DRG-Sapphire, which uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes. Built on Qwen2.5-7B and trained with Group Relative Policy Optimization (GRPO) using rule-based rewards, DRG-Sapphire introduces a series of RL enhancements to address domain-specific challenges not seen in previous mathematical tasks. Our model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and generates physician-validated reasoning for DRG assignments, significantly enhancing explainability. Our study further sheds light on broader challenges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Domain Adaptation and Few-Shot Learning · Topic Modeling