Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
Boyang Gu, Hongjian Zhou, Bradley Max Segal, Jinge Wu, Zeyu Cao, Hantao Zhong, Lei Clifton, Fenglin Liu, David A. Clifton

TL;DR
This paper introduces CRPO, a multi-objective reinforcement learning method that improves the faithfulness and comprehensiveness of large language models in clinical reasoning, surpassing previous methods in truthfulness and completeness.
Contribution
We propose CRPO, a scalable, multi-objective, verifiable RL approach that aligns LLMs with clinical reasoning principles without human annotations, demonstrated on a 3B model.
Findings
CRPO significantly improves truthfulness and completeness in clinical reasoning tasks.
The 3B Clinical-R1 model trained with CRPO outperforms standard methods on benchmarks.
CRPO maintains accuracy while enhancing faithfulness and comprehensiveness.
Abstract
Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
