Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
Shidong Yang, Tongwen Huang, Hao Wen, Yong Wang, Li Chen, Xiangxiang Chu

TL;DR
This paper introduces an entropy-guided training method for multimodal reward models that improves data efficiency and robustness by using response entropy as a proxy for sample difficulty and noise, leading to better performance.
Contribution
The paper proposes a novel entropy-guided training approach that enhances multimodal reward models by addressing noise and sample difficulty, improving training efficiency and accuracy.
Findings
EGT outperforms state-of-the-art models on three benchmarks.
Entropy correlates strongly with response accuracy and noise.
The method effectively mitigates unreliable samples during training.
Abstract
Multimodal reward models are crucial for aligning multimodal large language models with human preferences. Recent works have incorporated reasoning capabilities into these models, achieving promising results. However, training these models suffers from two critical challenges: (1) the inherent noise in preference datasets, which degrades model performance, and (2) the inefficiency of conventional training methods, which ignore the differences in sample difficulty. In this paper, we identify a strong correlation between response entropy and accuracy, indicating that entropy can serve as a reliable and unsupervised proxy for annotation noise and sample difficulty. Based on this insight, we propose a novel Entropy-Guided Training (EGT) approach for multimodal reasoning reward models, which combines two strategies: (1) entropy-guided data curation to mitigate the impact of unreliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
