Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
Yiju Guo, Tianyi Hu, Zexu Sun, Yankai Lin

TL;DR
This paper introduces LENS, a framework that improves reinforcement learning for reasoning tasks by removing interference tokens from prompts, leading to better performance and faster convergence.
Contribution
The paper proposes the Less Noise Sampling Framework (LENS), which enhances RLVR by identifying and removing interference tokens to improve exploration efficiency.
Findings
LENS outperforms GRPO with a 3.88% gain in math reasoning.
LENS achieves over 1.6× speedup in convergence.
LENS improves reasoning performance on scientific and general tasks.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has advanced LLM reasoning, but remains constrained by inefficient exploration under limited rollout budgets, leading to low sampling success and unstable training in complex tasks. We find that many exploration failures arise not from problem difficulty, but from a small number of prompt tokens that introduce interference. Building on this insight, we propose the Less Noise Sampling Framework (LENS), which first prompts by identifying and removing interference tokens. then transfers successful rollouts from the purification process to supervise policy optimization on the original noisy prompts, enabling the model to learn to ignore interference in the real-world, noisy prompting settings. Experimental results show that LENS significantly outperforms GRPO, delivering higher performance and faster convergence, with a 3.88% average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
