Auto-Rubric: Learning From Implicit Weights to Explicit Rubrics for Reward Modeling
Lipeng Xie, Sen Huang, Zhuo Zhang, Anni Zou, Yunpeng Zhai, Dingchao Ren, Kezun Zhang, Haoyuan Hu, Boyin Liu, Haoran Chen, Zhaoyang Liu, Bolin Ding

TL;DR
This paper introduces a novel approach to reward modeling that uses explicit, hierarchical rubrics derived from iterative, verification-driven refinement, outperforming traditional neural weight-based models with less data.
Contribution
It presents a training-free framework for learning explicit rubrics from preference data, enabling interpretable and effective reward functions without gradient descent.
Findings
Outperforms fully trained reward models on multiple benchmarks.
Achieves 80.91% on RewardBench2 with only 70 preference pairs.
Demonstrates high compressibility and interpretability of reward signals.
Abstract
Conventional reward modeling relies on gradient descent over neural weights, creating opaque, data-hungry "black boxes." We propose a paradigm shift from implicit to explicit reward parameterization, recasting optimization from continuous weight spaces to the discrete space of natural language rubrics. We introduce a training-free framework based on iterative rubric learning: it locally induces discriminative criteria via verification-driven refinement, and globally compresses the candidate criteria pool into a compact core set by maximizing an information-theoretic coding rate objective. We organize the compressed core set into a hierarchical rubric structure -- high-level evaluation dimensions supported by concrete verification checks -- serving as an interpretable, portable reward function. Empirically, our approach challenges prevailing data scaling assumptions: using only 70…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Domain Adaptation and Few-Shot Learning
