Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models
Weijie Qiu, Dai Guan, Junxin Wang, Zhihang Li, Yongbo Gai, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

TL;DR
This paper introduces Proxy-GRM, a reinforcement learning approach that trains lightweight proxy agents to generate high-quality, transferable rubrics for vision-language models, significantly improving reward evaluation accuracy and transferability.
Contribution
It proposes Proxy-GRM, a novel method that explicitly optimizes rubrics via proxy-guided verification, enhancing their quality and transferability in VLM reward models.
Findings
Proxy-GRM achieves state-of-the-art results on multiple reward benchmarks.
Proxy rubrics transfer effectively to unseen evaluators, improving test-time reward accuracy.
Proxy-SFT outperforms Proxy-RL as a verifier in rubric quality.
Abstract
Generative reward models (GRMs) for vision-language models (VLMs) often evaluate outputs via a three-stage pipeline: rubric generation, criterion-based scoring, and a final verdict. However, the intermediate rubric is rarely optimized directly. Prior work typically either treats rubrics as incidental or relies on expensive LLM-as-judge checks that provide no differentiable signal and limited training-time guidance. We propose Proxy-GRM, which introduces proxy-guided rubric verification into Reinforcement Learning (RL) to explicitly enhance rubric quality. Concretely, we train lightweight proxy agents (Proxy-SFT and Proxy-RL) that take a candidate rubric together with the original query and preference pair, and then predict the preference ordering using only the rubric as evidence. The proxy's prediction accuracy serves as a rubric-quality reward, incentivizing the model to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
