Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports
Yishu Wei, Yi Lin, Adam Flanders, George Shih, Yifan Peng

TL;DR
This paper introduces a two-stage reinforcement learning approach to enhance disease classification accuracy and reasoning in radiology reports using large language models.
Contribution
It combines supervised fine-tuning with a novel reinforcement learning method, GRPO, to improve both accuracy and reasoning without requiring explicit reasoning supervision.
Findings
SFT outperforms baseline models in disease classification.
GRPO further improves classification accuracy.
GRPO enhances reasoning recall and comprehensiveness.
Abstract
Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed baselines and GRPO further improved classification and enhanced reasoning recall and comprehensiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
