Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

Yiju Guo; Tianyi Hu; Zexu Sun; Yankai Lin

arXiv:2601.21244·cs.LG·April 21, 2026

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

Yiju Guo, Tianyi Hu, Zexu Sun, Yankai Lin

PDF

TL;DR

This paper introduces LENS, a framework that improves reinforcement learning for reasoning tasks by removing interference tokens from prompts, leading to better performance and faster convergence.

Contribution

The paper proposes the Less Noise Sampling Framework (LENS), which enhances RLVR by identifying and removing interference tokens to improve exploration efficiency.

Findings

01

LENS outperforms GRPO with a 3.88% gain in math reasoning.

02

LENS achieves over 1.6× speedup in convergence.

03

LENS improves reasoning performance on scientific and general tasks.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has advanced LLM reasoning, but remains constrained by inefficient exploration under limited rollout budgets, leading to low sampling success and unstable training in complex tasks. We find that many exploration failures arise not from problem difficulty, but from a small number of prompt tokens that introduce interference. Building on this insight, we propose the Less Noise Sampling Framework (LENS), which first prompts by identifying and removing interference tokens. then transfers successful rollouts from the purification process to supervise policy optimization on the original noisy prompts, enabling the model to learn to ignore interference in the real-world, noisy prompting settings. Experimental results show that LENS significantly outperforms GRPO, delivering higher performance and faster convergence, with a 3.88% average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.