DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
Rui Liu, Dian Yu, Zhenwen Liang, Yucheng Shi, Tong Zheng, Runpeng Dai, Haitao Mi, Pratap Tokekar, Leoweiliang

TL;DR
DeltaRubric introduces a novel multimodal reward evaluation method that uses a plan-and-execute process within a single large language model to improve alignment accuracy.
Contribution
It reformulates multimodal preference evaluation as a joint planning and verification process, enhancing reliability and generalization in reward modeling.
Findings
DeltaRubric improves accuracy by over 18 points on VL-RewardBench.
It outperforms standard no-rubric baselines in multimodal evaluation.
Joint optimization of planning and verification enhances reward model robustness.
Abstract
Aligning Multimodal Large Language Models (MLLMs) requires reliable reward models, yet existing single-step evaluators can suffer from lazy judging, exploiting language priors over fine-grained visual verification. While rubric-based evaluation mitigates these biases in text-only settings, extending it to multimodal tasks is bottlenecked by the complexity of visual reasoning. The critical differences between responses often depend on instance-specific visual details. Robust evaluation requires dynamically synthesizing rubrics that isolate spatial and factual discrepancies. To address this, we introduce , an approach that reformulates multimodal preference evaluation as a plan-and-execute process within a single MLLM. DeltaRubric operates in two steps: acting first as a , the model generates a neutral, instance-specific verification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
