CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

Aishik Nagar,Arun-Kumar Kaliya-Perumal,Yu-Hsuan Han,Andrew Sheng-Han Huang,Kristen Kee,Yushi Cao,Yiming Chen,Hongchao Jiang

arXiv:2605.09584·cs.CL·May 12, 2026

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

Aishik Nagar,Arun-Kumar Kaliya-Perumal,Yu-Hsuan Han,Andrew Sheng-Han Huang,Kristen Kee,Yushi Cao,Yiming Chen,Hongchao Jiang

PDF

TL;DR

CLR-voyance reformulates inpatient clinical reasoning as a POMDP, using outcome-aware rubrics supervised by clinicians to improve reasoning accuracy and evaluation, achieving state-of-the-art results and real-world deployment.

Contribution

It introduces a novel framework that combines outcome-grounded rewards with clinician-validated rubrics for inpatient reasoning, enhancing model performance and interpretability.

Findings

01

CLR-voyance-8B achieves 84.91% on CLR-POMDP, outperforming GPT-5 and MedGemma-27B.

02

Models trained with CLR-voyance show state-of-the-art reasoning capabilities.

03

Clinician studies validate the clinical relevance and effectiveness of the approach.

Abstract

Inpatient clinical reasoning is a sequential decision under partial observability: the clinician sees the admission so far and must choose the next action whose downstream consequences are not yet visible. Existing clinical-LLM evaluations and RL rewards signals collapse this into closed-form retrieval, clinical journey leakage, or unanchored LLM-as-judge scoring. We introduce CLR-voyance, a framework that reformulates inpatient reasoning as a Partially Observable Markov Decision Process (POMDP) and supervises it with rewards that are simultaneously outcome-grounded and clinician-validated. We instantiate the formulation as CLR-POMDP, which partitions successful patient journeys into a policy-visible past and an oracle-only future. Using the past information, an oracle LLM generates a case-specific query-answer pair, and the first adaptive rubric for clinical reasoning which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.