MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning
Chenglong Wang, Yifu Huo, Yang Gan, Qiaozhi He, Qi Meng, Bei Li, Yan Wang, Junfu Liu, Tianhua Zhou, Jingbo Zhu, Tong Xiao

TL;DR
This paper introduces MSRL, a multi-stage reinforcement learning framework that enables scalable training of generative multimodal reward models using limited multimodal data, significantly improving performance without extra annotations.
Contribution
MSRL proposes a novel multi-stage reinforcement learning approach that transfers textual reward reasoning to multimodal tasks, reducing reliance on costly preference data.
Findings
Improved performance on VL-RewardBench from 66.6% to 75.9%.
Enhanced results on GenAI-Bench from 70.2% to 75.7%.
Effective scaling of multimodal reward modeling without additional annotations.
Abstract
Recent advances in multimodal reward modeling have been largely driven by a paradigm shift from discriminative to generative approaches. Building on this progress, recent studies have further employed reinforcement learning from verifiable rewards (RLVR) to enhance multimodal reward models (MRMs). Despite their success, RLVR-based training typically relies on labeled multimodal preference data, which are costly and labor-intensive to obtain, making it difficult to scale MRM training. To overcome this limitation, we propose a Multi-Stage Reinforcement Learning (MSRL) approach, which can achieve scalable RL for MRMs with limited multimodal data. MSRL replaces the conventional RLVR-based training paradigm by first learning a generalizable reward reasoning capability from large-scale textual preference data, and then progressively transferring this capability to multimodal tasks through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Emotion and Mood Recognition · Recommender Systems and Techniques
