REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

Yasi Zhang; Tianyu Chen; Mingyuan Zhou; Oscar Leong; Ying Nian Wu; Michal Lukasik

arXiv:2603.17145·cs.LG·March 19, 2026

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

Yasi Zhang, Tianyu Chen, Mingyuan Zhou, Oscar Leong, Ying Nian Wu, Michal Lukasik

PDF

Open Access

TL;DR

The paper introduces REAL, a novel reinforcement learning framework that optimizes regression-based rewards for LLM evaluation, improving correlation with human judgments and outperforming existing methods across various model scales.

Contribution

REAL is the first RL framework specifically designed for regression-aware optimization in LLM evaluation, addressing policy-dependence issues with a generalized policy gradient approach.

Findings

01

REAL outperforms regression-aware SFT baselines and standard RL methods.

02

Achieves +8.40 Pearson and +7.20 Spearman correlation improvements.

03

Demonstrates better out-of-domain generalization across model scales.

Abstract

Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1 accuracy), thereby ignoring the ordinal structure inherent in regression tasks; for instance, they fail to recognize that predicting 4 is significantly better than predicting 1 when the ground truth is 5. Conversely, existing regression-aware approaches are often confined to Supervised Fine-Tuning (SFT), limiting their ability to explore optimal reasoning paths. To bridge this gap, we propose \textbf{REAL} (\underline{RE}gression-\underline{A}ware Reinforcement \underline{L}earning), a principled RL framework designed to optimize regression rewards, and also proven to be optimal for correlation metrics. A key technical challenge is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling