DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Rui Liu; Dian Yu; Zhenwen Liang; Yucheng Shi; Tong Zheng; Runpeng Dai; Haitao Mi; Pratap Tokekar; Leoweiliang

arXiv:2605.09269·cs.CL·May 12, 2026

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Rui Liu, Dian Yu, Zhenwen Liang, Yucheng Shi, Tong Zheng, Runpeng Dai, Haitao Mi, Pratap Tokekar, Leoweiliang

PDF

TL;DR

DeltaRubric introduces a novel multimodal reward evaluation method that uses a plan-and-execute process within a single large language model to improve alignment accuracy.

Contribution

It reformulates multimodal preference evaluation as a joint planning and verification process, enhancing reliability and generalization in reward modeling.

Findings

01

DeltaRubric improves accuracy by over 18 points on VL-RewardBench.

02

It outperforms standard no-rubric baselines in multimodal evaluation.

03

Joint optimization of planning and verification enhances reward model robustness.

Abstract

Aligning Multimodal Large Language Models (MLLMs) requires reliable reward models, yet existing single-step evaluators can suffer from lazy judging, exploiting language priors over fine-grained visual verification. While rubric-based evaluation mitigates these biases in text-only settings, extending it to multimodal tasks is bottlenecked by the complexity of visual reasoning. The critical differences between responses often depend on instance-specific visual details. Robust evaluation requires dynamically synthesizing rubrics that isolate spatial and factual discrepancies. To address this, we introduce $DeltaRubric$ , an approach that reformulates multimodal preference evaluation as a plan-and-execute process within a single MLLM. DeltaRubric operates in two steps: acting first as a $Disagreement Planner$ , the model generates a neutral, instance-specific verification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.