Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
Mingrui Chen, Haogeng Liu, Hao Liang, Huaibo Huang, Wentao Zhang, Ran He

TL;DR
This paper explores how explicitly modeling difficulty priors enhances reinforcement learning fine-tuning for multimodal reasoning, using data filtering, adaptive reweighting, and difficulty hints to improve performance.
Contribution
It introduces a novel difficulty prior modeling approach with data filtering, adaptive advantage reweighting, and explicit difficulty hints for better multimodal reasoning.
Findings
Significant performance improvements on multimodal reasoning benchmarks.
Effective use of only 2.6K training data with the proposed methods.
Demonstrated benefits of difficulty-aware strategies in RL fine-tuning.
Abstract
In this work, we investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning. Our exploration mainly comprises of following three perspective: First, through offline data curation, we analyze the U-shaped difficulty distribution of two given datasets using the base model by multi-round sampling, and then filter out prompts that are either too simple or extremely difficult to provide meaningful gradients and perform subsequent two-stage training. Second, we implement an online advantage differentiation, computing group-wise empirical accuracy as a difficulty proxy to adaptively reweight advantages estimation, providing stronger learning signals for more challenging problems. Finally, we introduce difficulty hints as explicit prompts for more complex samples in the second training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Topic Modeling
MethodsBalanced Selection
