MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning

Chenglong Wang; Yifu Huo; Yang Gan; Qiaozhi He; Qi Meng; Bei Li; Yan Wang; Junfu Liu; Tianhua Zhou; Jingbo Zhu; Tong Xiao

arXiv:2603.25108·cs.CV·March 27, 2026

MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning

Chenglong Wang, Yifu Huo, Yang Gan, Qiaozhi He, Qi Meng, Bei Li, Yan Wang, Junfu Liu, Tianhua Zhou, Jingbo Zhu, Tong Xiao

PDF

Open Access

TL;DR

This paper introduces MSRL, a multi-stage reinforcement learning framework that enables scalable training of generative multimodal reward models using limited multimodal data, significantly improving performance without extra annotations.

Contribution

MSRL proposes a novel multi-stage reinforcement learning approach that transfers textual reward reasoning to multimodal tasks, reducing reliance on costly preference data.

Findings

01

Improved performance on VL-RewardBench from 66.6% to 75.9%.

02

Enhanced results on GenAI-Bench from 70.2% to 75.7%.

03

Effective scaling of multimodal reward modeling without additional annotations.

Abstract

Recent advances in multimodal reward modeling have been largely driven by a paradigm shift from discriminative to generative approaches. Building on this progress, recent studies have further employed reinforcement learning from verifiable rewards (RLVR) to enhance multimodal reward models (MRMs). Despite their success, RLVR-based training typically relies on labeled multimodal preference data, which are costly and labor-intensive to obtain, making it difficult to scale MRM training. To overcome this limitation, we propose a Multi-Stage Reinforcement Learning (MSRL) approach, which can achieve scalable RL for MRMs with limited multimodal data. MSRL replaces the conventional RLVR-based training paradigm by first learning a generalizable reward reasoning capability from large-scale textual preference data, and then progressively transferring this capability to multimodal tasks through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Emotion and Mood Recognition · Recommender Systems and Techniques