FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents
Jiaqi Shao, Yufeng Miao, Wei Zhang, Bing Luo

TL;DR
FoldAct introduces a novel framework for stable and efficient long-horizon reinforcement learning in large language models by addressing the challenges of context folding, resulting in improved training stability and speed.
Contribution
It proposes a new method with separated loss, context consistency, and segment training to tackle non-stationarity and computational issues in context folding for RL.
Findings
Achieves 5.19× training speedup.
Addresses non-stationary observation distribution.
Enables stable training of long-horizon search agents.
Abstract
Long-horizon reinforcement learning (RL) for large language models faces critical scalability challenges from unbounded context growth, leading to context folding methods that compress interaction history during task execution. However, existing approaches treat summary actions as standard actions, overlooking that summaries fundamentally modify the agent's future observation space, creating a policy-dependent, non-stationary observation distribution that violates core RL assumptions. This introduces three fundamental challenges: (1) gradient dilution where summary tokens receive insufficient training signal, (2) self-conditioning where policy updates change summary distributions, creating a vicious cycle of training collapse, and (3) computational cost from processing unique contexts at each turn. We introduce \textbf{FoldAct}\footnote{https://github.com/SHAO-Jiaqi757/FoldAct}, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
