REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
Yasi Zhang, Tianyu Chen, Mingyuan Zhou, Oscar Leong, Ying Nian Wu, Michal Lukasik

TL;DR
The paper introduces REAL, a novel reinforcement learning framework that optimizes regression-based rewards for LLM evaluation, improving correlation with human judgments and outperforming existing methods across various model scales.
Contribution
REAL is the first RL framework specifically designed for regression-aware optimization in LLM evaluation, addressing policy-dependence issues with a generalized policy gradient approach.
Findings
REAL outperforms regression-aware SFT baselines and standard RL methods.
Achieves +8.40 Pearson and +7.20 Spearman correlation improvements.
Demonstrates better out-of-domain generalization across model scales.
Abstract
Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1 accuracy), thereby ignoring the ordinal structure inherent in regression tasks; for instance, they fail to recognize that predicting 4 is significantly better than predicting 1 when the ground truth is 5. Conversely, existing regression-aware approaches are often confined to Supervised Fine-Tuning (SFT), limiting their ability to explore optimal reasoning paths. To bridge this gap, we propose \textbf{REAL} (\underline{RE}gression-\underline{A}ware Reinforcement \underline{L}earning), a principled RL framework designed to optimize regression rewards, and also proven to be optimal for correlation metrics. A key technical challenge is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling
