Training Data Efficiency in Multimodal Process Reward Models

Jinyuan Li; Chengsong Huang; Langlin Huang; Shaoyang Xu; Haolin Liu; Wenxuan Zhang; Jiaxin Huang

arXiv:2602.04145·cs.LG·February 6, 2026

Training Data Efficiency in Multimodal Process Reward Models

Jinyuan Li, Chengsong Huang, Langlin Huang, Shaoyang Xu, Haolin Liu, Wenxuan Zhang, Jiaxin Huang

PDF

Open Access

TL;DR

This paper investigates the data efficiency of training Multimodal Process Reward Models, revealing redundancy in data and proposing a method that achieves full performance with only 10% of the data by prioritizing informative samples.

Contribution

It introduces the Balanced-Information Score (BIS), a novel data selection criterion that improves training efficiency without additional costs, based on theoretical insights into gradient informativeness.

Findings

01

BIS-selected subsets match or surpass full-data performance.

02

Using only 10% of data, BIS achieves full performance, outperforming random sampling.

03

Training saturates quickly under random subsampling, indicating redundancy.

Abstract

Multimodal Process Reward Models (MPRMs) are central to step-level supervision for visual reasoning in MLLMs. Training MPRMs typically requires large-scale Monte Carlo (MC)-annotated corpora, incurring substantial training cost. This paper studies the data efficiency for MPRM training. Our preliminary experiments reveal that MPRM training quickly saturates under random subsampling of the training data, indicating substantial redundancy within existing MC-annotated corpora. To explain this, we formalize a theoretical framework and reveal that informative gradient updates depend on two factors: label mixtures of positive/negative steps and label reliability (average MC scores of positive steps). Guided by these insights, we propose the Balanced-Information Score (BIS), which prioritizes both mixture and reliability based on existing MC signals at the rollout level, without incurring any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Explainable Artificial Intelligence (XAI)