You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

Yinuo Yang; Zixian Ma; Manasi Ganti; Jieyu Zhang; Ranjay Krishna

arXiv:2604.10966·cs.CV·April 17, 2026

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna

PDF

1 Models 1 Datasets

TL;DR

This paper introduces a multi-response discriminative reward model that scores multiple responses simultaneously in a single forward pass, improving efficiency and performance in multimodal reward evaluation.

Contribution

The authors propose a novel multi-response reward modeling approach that enables N-way preference learning and constructs new benchmarks for multimodal response ranking.

Findings

01

Achieves state-of-the-art results on six multimodal reward benchmarks.

02

Provides up to N× speedup and FLOPs reduction over traditional single-response models.

03

Improves reinforcement learning policies with better open-ended generation quality.

Abstract

We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separator tokens and applies cross-entropy over their scalar scores, enabling direct comparative reasoning and efficient $N$ -way preference learning. The multi-response design also yields up to $N \times$ wall-clock speedup and FLOPs reduction over conventional single-response scoring. To enable $N$ -way reward evaluation beyond existing pairwise benchmarks, we construct two new benchmarks: (1) MR $^{2}$ Bench-Image contains human-annotated rankings over responses from 8 diverse models; (2) MR $^{2}$ Bench-Video is a large-scale video-based reward benchmark derived from 94K crowdsourced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
yinuoy/MR2-Molmo2-4B-RM
model· 37 dl· ♡ 1
37 dl♡ 1

Datasets

yinuoy/MR2Bench
dataset· 2.5k dl
2.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.