Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
Zhuoen Chen, Dongfang Li, Meishan Zhang, Baotian Hu, Min Zhang

TL;DR
This paper introduces a cognitively inspired framework for long-context reasoning in LLMs that compresses memory, selectively recalls relevant information, and is optimized via reinforcement learning, significantly improving efficiency and scalability.
Contribution
It proposes a novel end-to-end reinforcement learning approach for dynamic long-context inference using compressed memory and selective recall, addressing efficiency and scalability issues.
Findings
Achieves competitive accuracy on multi-hop reasoning benchmarks.
Extends context length from 7K to 1.75M tokens.
Reduces GPU memory usage by up to 2x and speeds up inference by 6x.
Abstract
Large Language Models (LLMs) face significant challenges in long-context processing, including quadratic computational costs, information forgetting, and the context fragmentation inherent in retrieval-augmented generation (RAG). We propose a cognitively inspired framework for efficient long-context inference based on chunk-wise compression and selective memory recall, rather than processing all raw tokens. The framework segments long inputs into chunks and encodes each chunk into compressed memory representations using a learned compressor. A gating module dynamically selects relevant memory blocks, which are then iteratively processed by a reasoning module with an evolving working memory to solve downstream tasks. The compressor and reasoner are jointly optimized via end-to-end reinforcement learning, while the gating module is trained separately as a classifier. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
