Efficient RLVR Training via Weighted Mutual Information Data Selection
Xinyu Zhou, Boyu Zhu, Haotian Zhang, Huiming Wang, Zhijiang Guo

TL;DR
This paper introduces InSight, a novel data selection method for reinforcement learning with verifiable rewards, that improves training efficiency and performance by using a Bayesian mutual information approach to better select informative data.
Contribution
InSight is a new data sampling method based on weighted mutual information, addressing limitations of difficulty-based heuristics in RL training for language models.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Improves training efficiency by up to 2.2x.
Demonstrates consistent gains across diverse reasoning tasks.
Abstract
Reinforcement learning (RL) plays a central role in improving the reasoning and alignment of large language models, yet its efficiency critically depends on how training data are selected. Existing online selection strategies predominantly rely on difficulty-based heuristics, favouring datapoints with intermediate success rates, implicitly equating difficulty with informativeness and neglecting epistemic uncertainty arising from limited evidence. We introduce InSight, an INformation-guided data SamplInG metHod for RL Training, grounded in a weighted mutual information objective. By modeling data outcomes with Bayesian latent success rates, we show that expected uncertainty reduction decomposes into complementary difficulty- and evidence-dependent components, revealing a fundamental limitation of difficulty-only selection. Leveraging this observation, InSight constructs a stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
