Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints

Yongnan Jin; Xurui Li; Feng Cao; Liucun Gao; Juanjuan Yao

arXiv:2511.16139·cs.AI·December 5, 2025

Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints

Yongnan Jin, Xurui Li, Feng Cao, Liucun Gao, Juanjuan Yao

PDF

Open Access 1 Models

TL;DR

This paper presents MR-RML, a novel multi-dimensional reward learning framework with geometric constraints that improves alignment of large language models with complex medical standards, leading to state-of-the-art clinical evaluation performance.

Contribution

It introduces a multi-perspective medical standard system, an independent reward model, and geometric projection constraints to enhance LLM alignment with medical criteria.

Findings

01

Significant performance improvements on Healthbench benchmark.

02

Achieves state-of-the-art results among open-source LLMs.

03

Outperforms many closed-source models in medical evaluation tasks.

Abstract

The integration of large language models (LLMs) into medical practice offers transformative potential, yet their real-world clinical applicability remains constrained by critical alignment issues: (1) a misalignment between static evaluation benchmarks and the dynamic cognitive demands of clinical practice, (2) challenges in adapting to continuously evolving, multi-source medical standards, and (3) the limited capacity of conventional reward models to reflect nuanced, multi-dimensional medical quality criteria. To overcome these limitations, we introduce MR-RML (Multidimensional Rubric-oriented Reward Model Learning) with GPRC (Geometric Projection Reference Constraints)-a novel alignment framework that structured medical standards into a multi-perspective matrix to guide both data generation and model optimization. Our approach introduces three key innovations: (1) a medical standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
mingpinDZJ/Shanzhi-M1
model· 2 dl· ♡ 3
2 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)