CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning

Tianshi Xu; Yuteng Chen; Meng Li

arXiv:2601.15141·cs.LG·January 22, 2026

CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning

Tianshi Xu, Yuteng Chen, Meng Li

PDF

Open Access

TL;DR

CLEANER introduces a self-purification method for agentic RL that uses model-intrinsic self-correction to improve learning efficiency and accuracy, especially in parameter-constrained models, by constructing cleaner training trajectories.

Contribution

The paper presents CLEANER, a novel trajectory purification approach leveraging self-correction to enhance agentic RL training without external filtering or high computational costs.

Findings

01

Achieves 6%, 3%, and 5% accuracy improvements on benchmarks.

02

Matches state-of-the-art performance with only one-third of training steps.

03

Demonstrates scalable, efficient trajectory purification for agentic RL.

Abstract

Agentic Reinforcement Learning (RL) has empowered Large Language Models (LLMs) to utilize tools like Python interpreters for complex problem-solving. However, for parameter-constrained models (e.g., 4B--7B), the exploration phase is often plagued by frequent execution failures, creating noisy trajectories that hinder policy optimization. Under standard outcome-based reward settings, this noise leads to a critical credit assignment issue, where erroneous actions are inadvertently reinforced alongside successful outcomes. Existing mitigations face a dilemma: dense rewards often trigger reward hacking, while supersampling incurs prohibitive computational costs. To address these challenges, we propose CLEANER. Distinct from external filtering methods, CLEANER exploits the model's intrinsic self-correction capabilities to eliminate error-contaminated context directly during data collection.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)