Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention
Huanxuan Liao, Wen Hu, Yao Xu, Shizhu He, Jun Zhao, Kang Liu

TL;DR
This paper introduces HyCo2, a hybrid context compression method for LLMs that balances local and global information retention, significantly improving long-text reasoning and reducing token usage.
Contribution
HyCo2 integrates global and local context compression with adaptive token retention, enhancing efficiency and performance in long-sequence inference for LLMs.
Findings
Improves LLM performance by 13.1% on QA benchmarks.
Reduces token usage by 88.8%.
Enhances long-text reasoning capabilities.
Abstract
Large Language Models (LLMs) encounter significant challenges in long-sequence inference due to computational inefficiency and redundant processing, driving interest in context compression techniques. Existing methods often rely on token importance to perform hard local compression or encode context into latent representations for soft global compression. However, the uneven distribution of textual content relevance and the diversity of demands for user instructions mean these approaches frequently lead to the loss of potentially valuable information. To address this, we propose brid ntext mpression (HyCo) for LLMs, which integrates both global and local perspectives to guide context compression while retaining both the essential semantics and critical details for task completion. Specifically, we employ a hybrid adapter to refine global…
Peer Reviews
Decision·Submitted to ICLR 2026
**Strengths** (1)The paper is clearly written and easy to follow. The motivation is simple yet reasonable, and a direct, effective method is proposed to achieve this motivation. (2)Thorough ablation experiments are provided, demonstrating the rationale and effectiveness of each design choice.
**Weaknesses** (1)The paper lacks comparisons with some state-of-the-art methods in Soft Compression, such as ICAE and UniICL, which are mentioned in the related work section. (2)The description of the TRAINING STRATEGY is insufficiently detailed. The paper does not clearly explain why the paraphrase task and completion task allow the compression module to extract high-quality tokens, why these tasks are particularly suited for local and global compression, respectively, and why paraphrase pre
- **Novel Hybrid Compression Approach**: HyCo² effectively integrates hard (explicit token selection) and soft (latent embedding) compression strategies, achieving a balanced retention of both local details and global semantics. This hybrid framework significantly reduces computational costs and context length without substantial performance loss. - **Strong Empirical Validation**: Comprehensive experiments across multiple QA benchmarks, including LongBench, show HyCo² consistently surpasses ex
- **Limited Task Diversity in Evaluation**: The majority of experiments focus on question-answering (QA) tasks. It's unclear how well HyCo² generalizes to other tasks such as summarization, reasoning-heavy tasks, code generation, or dialogue scenarios, where context compression needs might differ significantly. - **Performance Gap on LongBench**: On the LongBench benchmark under the 2k-token constraint, HyCo²'s performance still substantially lags behind the "vanilla" (uncompressed) setting. Th
* The paper proposes a hybrid local-global design, which has not been explored much. * The three-stage training strategy (paraphrase-completion-instruction tuning) is novel and carefully designed. The effectiveness of this strategy is empirically verified through ablation studies.
* The importance of hard local token selection mechanisms using the classification layer is not sufficiently demonstrated. For example, an ablation study varying Top-k% would be beneficial. * Overall, there are many unclear and insufficiently justified parts, especially related to model architecture and training. * Empirical justification of why “noisy” MoE is essential. * The length of G(V), a gating output, does not seem to match the number of output tokens from LocalMLP or QFormer. * Th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Distributed and Parallel Computing Systems
MethodsAdapter
