You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass
Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna

TL;DR
This paper introduces a multi-response discriminative reward model that scores multiple responses simultaneously in a single forward pass, improving efficiency and performance in multimodal reward evaluation.
Contribution
The authors propose a novel multi-response reward modeling approach that enables N-way preference learning and constructs new benchmarks for multimodal response ranking.
Findings
Achieves state-of-the-art results on six multimodal reward benchmarks.
Provides up to N× speedup and FLOPs reduction over traditional single-response models.
Improves reinforcement learning policies with better open-ended generation quality.
Abstract
We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separator tokens and applies cross-entropy over their scalar scores, enabling direct comparative reasoning and efficient -way preference learning. The multi-response design also yields up to wall-clock speedup and FLOPs reduction over conventional single-response scoring. To enable -way reward evaluation beyond existing pairwise benchmarks, we construct two new benchmarks: (1) MRBench-Image contains human-annotated rankings over responses from 8 diverse models; (2) MRBench-Video is a large-scale video-based reward benchmark derived from 94K crowdsourced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
