Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

Yishu Wei; Yi Lin; Adam Flanders; George Shih; Yifan Peng

arXiv:2604.19060·cs.AI·April 22, 2026

Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

Yishu Wei, Yi Lin, Adam Flanders, George Shih, Yifan Peng

PDF

TL;DR

This paper introduces a two-stage reinforcement learning approach to enhance disease classification accuracy and reasoning in radiology reports using large language models.

Contribution

It combines supervised fine-tuning with a novel reinforcement learning method, GRPO, to improve both accuracy and reasoning without requiring explicit reasoning supervision.

Findings

01

SFT outperforms baseline models in disease classification.

02

GRPO further improves classification accuracy.

03

GRPO enhances reasoning recall and comprehensiveness.

Abstract

Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed baselines and GRPO further improved classification and enhanced reasoning recall and comprehensiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.