Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar, Hariprasad Korsapati, Brian Bartlett, Bryan Hull, Jimeng Sun

TL;DR
This paper presents DRG-Sapphire, a reinforcement learning-based model that improves out-of-distribution diagnosis coding in clinical notes, achieving state-of-the-art accuracy and better explainability, while analyzing RL's limitations in knowledge-intensive tasks.
Contribution
Introduces DRG-Sapphire, a novel RL approach with rule-based rewards for DRG coding, demonstrating improved accuracy and explainability in OOD clinical tasks.
Findings
RL performance scales with the logarithm of fine-tuning examples
Strong RL performance requires sufficient domain knowledge in the base model
Scaling supervised fine-tuning is more effective than scaling RL alone
Abstract
Diagnosis-Related Group (DRG) codes are essential for hospital reimbursement and operations but require labor-intensive assignment. Large Language Models (LLMs) struggle with DRG coding due to the out-of-distribution (OOD) nature of the task: pretraining corpora rarely contain private clinical or billing data. We introduce DRG-Sapphire, which uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes. Built on Qwen2.5-7B and trained with Group Relative Policy Optimization (GRPO) using rule-based rewards, DRG-Sapphire introduces a series of RL enhancements to address domain-specific challenges not seen in previous mathematical tasks. Our model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and generates physician-validated reasoning for DRG assignments, significantly enhancing explainability. Our study further sheds light on broader challenges…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Domain Adaptation and Few-Shot Learning · Topic Modeling
