Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions
Payal Fofadiya, Sunil Tiwari

TL;DR
This paper introduces an adaptive context compression framework for LLMs that enhances long-term interaction quality and efficiency by intelligently managing memory and context size.
Contribution
The paper proposes a novel adaptive compression method combining importance-aware memory, coherence filtering, and dynamic budgeting for improved LLM long-term performance.
Findings
Improves conversational stability and retrieval accuracy.
Reduces token usage and inference latency.
Achieves consistent performance gains over existing methods.
Abstract
Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
