Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks

William F. Shen; Xinchi Qiu; Chenxi Whitehouse; Lisa Alazraki; Shashwat Goel; Francesco Barbieri; Timon Willi; Akhil Mathur; Ilias Leontiadis

arXiv:2602.05125·cs.LG·February 6, 2026

Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks

William F. Shen, Xinchi Qiu, Chenxi Whitehouse, Lisa Alazraki, Shashwat Goel, Francesco Barbieri, Timon Willi, Akhil Mathur, Ilias Leontiadis

PDF

Open Access

TL;DR

This paper introduces RRD, a recursive framework for refining rubrics to improve LLM judging and reward modeling, leading to more accurate evaluations and stronger reinforcement learning signals in open-ended tasks.

Contribution

We propose RRD, a recursive rubric refinement method that enhances coverage, reduces redundancy, and aligns criteria, significantly improving LLM evaluation and reward quality.

Findings

01

Improves preference judgment accuracy by up to +17.7 points.

02

Boosts reward signals by up to 160% over prior methods.

03

Achieves consistent gains across multiple benchmarks.

Abstract

Recently, rubrics have been used to guide LLM judges in capturing subjective, nuanced, multi-dimensional human preferences, and have been extended from evaluation to reward signals for reinforcement fine-tuning (RFT). However, rubric generation remains hard to control: rubrics often lack coverage, conflate dimensions, misalign preference direction, and contain redundant or highly correlated criteria, degrading judge accuracy and producing suboptimal rewards during RFT. We propose RRD, a principled framework for rubric refinement built on a recursive decompose-filter cycle. RRD decomposes coarse rubrics into fine-grained, discriminative criteria, expanding coverage while sharpening separation between responses. A complementary filtering mechanism removes misaligned and redundant rubrics, and a correlation-aware weighting scheme prevents over-representing highly correlated criteria,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Emotion and Mood Recognition · Ethics and Social Impacts of AI