Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning

Mingrui Chen; Haogeng Liu; Hao Liang; Huaibo Huang; Wentao Zhang; Ran He

arXiv:2505.13261·cs.CV·December 16, 2025

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning

Mingrui Chen, Haogeng Liu, Hao Liang, Huaibo Huang, Wentao Zhang, Ran He

PDF

Open Access

TL;DR

This paper explores how explicitly modeling difficulty priors enhances reinforcement learning fine-tuning for multimodal reasoning, using data filtering, adaptive reweighting, and difficulty hints to improve performance.

Contribution

It introduces a novel difficulty prior modeling approach with data filtering, adaptive advantage reweighting, and explicit difficulty hints for better multimodal reasoning.

Findings

01

Significant performance improvements on multimodal reasoning benchmarks.

02

Effective use of only 2.6K training data with the proposed methods.

03

Demonstrated benefits of difficulty-aware strategies in RL fine-tuning.

Abstract

In this work, we investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning. Our exploration mainly comprises of following three perspective: First, through offline data curation, we analyze the U-shaped difficulty distribution of two given datasets using the base model by multi-round sampling, and then filter out prompts that are either too simple or extremely difficult to provide meaningful gradients and perform subsequent two-stage training. Second, we implement an online advantage differentiation, computing group-wise empirical accuracy as a difficulty proxy to adaptively reweight advantages estimation, providing stronger learning signals for more challenging problems. Finally, we introduce difficulty hints as explicit prompts for more complex samples in the second training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Topic Modeling

MethodsBalanced Selection