TL;DR
This paper introduces SeCo, a semantic-driven context compression method for LLMs that overcomes position bias, improving efficiency and robustness in long-context scenarios.
Contribution
SeCo shifts context compression from position-based to semantic-based, enhancing performance stability and semantic integrity in large language models.
Findings
SeCo outperforms existing methods on 14 benchmarks.
SeCo reduces inference latency significantly.
SeCo improves out-of-domain robustness.
Abstract
Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has emerged as a promising way to mitigate these costs by compressing sequences into compact embeddings, existing paradigms remain fundamentally constrained by position bias: they primarily rely on learnable tokens insertion at fixed positions or group tokens according to their physical token layout, thereby inducing performance instability and semantic fragmentation. To overcome this bottleneck, we propose Semantic Consistency Context Compression (SeCo), a method that shifts context compression from position-driven to semantic-driven. Rather than constraint by physical token layout, SeCo dynamically anchors compression directly in the semantic space by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
