DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

Zhihong Zhang; Jie Zhao; Xiaojian Huang; Jin Xu; Zhuodong Luo; Xin Liu; Jiansheng Wei; Xuejin Chen

arXiv:2604.19544·cs.AI·April 22, 2026

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

Zhihong Zhang, Jie Zhao, Xiaojian Huang, Jin Xu, Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen

PDF

TL;DR

This paper introduces DT2IT-MRM, a novel framework for improving multimodal reward models by debiasing preference data and iteratively training, leading to state-of-the-art results on key benchmarks.

Contribution

The paper proposes a new pipeline and training framework that enhances the quality of multimodal preference datasets for reward modeling.

Findings

01

Achieves state-of-the-art performance on VL-RewardBench.

02

Effectively curates noisy preference datasets.

03

Improves alignment of multimodal models with human preferences.

Abstract

Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual style bias, and unreliable preference signals. Besides, existing open-source multimodal preference datasets suffer from substantial noise, yet there is a lack of effective and scalable curation methods to enhance their quality. To address these limitations, we propose \textbf{DT2IT-MRM}, which integrates a \textbf{D}ebiased preference construction pipeline, a novel reformulation of text-to-image (\textbf{T2I}) preference data, and an \textbf{I}terative \textbf{T}raining framework that curates existing multimodal preference datasets for \textbf{M}ultimodal \textbf{R}eward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.