LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning
Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang

TL;DR
LongAct introduces a saliency-guided sparse update method that leverages intrinsic activation patterns in LLMs to improve long-context reasoning in reinforcement learning, achieving notable performance gains.
Contribution
The paper presents LongAct, a novel approach that selectively updates significant activation weights, enhancing long-context RL performance across various algorithms.
Findings
Achieves approximately 8% improvement on LongBench v2.
Enhances generalization on the RULER benchmark.
Boosts performance across multiple RL algorithms like GRPO and DAPO.
Abstract
Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. In this paper, we first observe the presence of high-magnitude activations within the query and key vectors when processing long contexts. Drawing inspiration from model quantization -- which establishes the criticality of such high-magnitude activations -- and the insight that long-context reasoning inherently exhibits a sparse structure, we hypothesize that these weights serve as the pivotal drivers for effective model optimization. Based on this insight, we propose LongAct, a strategy that shifts from uniform to saliency-guided sparse updates. By selectively updating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
