COMI: Coarse-to-fine Context Compression via Marginal Information Gain

Jiwei Tang; Shilei Liu; Zhicheng Zhang; Yujin Yuan; Libin Zheng; Wenbo Su; Bo Zheng

arXiv:2602.01719·cs.CL·March 9, 2026

COMI: Coarse-to-fine Context Compression via Marginal Information Gain

Jiwei Tang, Shilei Liu, Zhicheng Zhang, Yujin Yuan, Libin Zheng, Wenbo Su, Bo Zheng

PDF

Open Access 3 Reviews

TL;DR

COMI introduces a novel coarse-to-fine context compression method for large language models that optimizes relevance and diversity, significantly reducing input size while maintaining high performance across tasks.

Contribution

The paper presents COMI, a new adaptive compression framework using Marginal Information Gain to effectively reduce context size with minimal information loss.

Findings

01

Outperforms existing methods by approximately 25-point EM on NaturalQuestions.

02

Achieves high compression rates (32x) with minimal accuracy loss.

03

Demonstrates effectiveness across multiple tasks and model backbones.

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse tasks. However, their deployment in long context scenarios remains hindered by computational inefficiency and information redundancy. Context compression methods address these challenges by significantly reducing input length and eliminating redundancy. We propose COMI, a coarse-to-fine adaptive context compression framework that jointly optimizes for semantic relevance and diversity under high compression rates. We introduce Marginal Information Gain (MIG), a metric defined as the relevance of a unit to the input query minus its semantic redundancy with other units, guiding the compression process to prioritize information that is both relevant and low redundant. The framework operates in two stages: (1) Coarse-Grained Group Reallocation, where the context is partitioned into groups and dynamically…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1.Addresses redundancy among relevant tokens by incorporating a penalty score for related but redundant tokens, effectively reducing token redundancy. 2.Adaptive compression rate allocation across groups avoids applying the same pruning rate to both high-relevance and low-relevance groups, making the method more reasonable. 3.Weighted token merging within groups based on MIG scores reduces redundancy while ensuring relevance and preserving semantic diversity. 4.Experiments demonstrate the eff

Weaknesses

1.Insufficient clarity in figures: The layout of II/I/III in Figure 3 may cause reading difficulties. It is recommended to improve or enhance the labeling. 2.Insufficient comparative experiments under low compression rates: Comparative experiments under low compression rates are only conducted with the Activation Beacon method. It is recommended to include more methods in the tests. 3.Lack of controlled experiments under the same FLOPs: Although the authors compare accuracy under the same comp

Reviewer 02Rating 4Confidence 3

Strengths

1. The methods part of the paper is written clearly. 2. The structure of the paper is well organized.

Weaknesses

1. The proposed method seems to work only for the encoder-decoder architecture, but this is not the mainstream architecture nowadays. There is no motivation discussion to clarify why the authors would like to design a compression method that does not work for the most popular decoder-only structure. 2. There is no analysis of the compression rate pattern. The compression rate is determined dynamically via MIG; however, the paper does not provide any analysis to illustrate the pattern of the comp

Reviewer 03Rating 6Confidence 4

Strengths

Clear Motivation: The paper's motivation is clear. It identifies that LLMs fail to address the high semantic redundancy found among query-relevant tokens, leading to suboptimal attention allocation. The pilot experiment provides empirical evidence for this claim. The redundancy and semantic overlapping in the compression units have not been covered mainly in this domain. Strong Empirical Results: The method demonstrates significant performance improvements over existing baselines, especially a

Weaknesses

### Applicability to Stronger Long-Context Models: While COMI effectively boosts the base models' ability to digest context with reasonable latency, it notably outperforms even the original prompt baseline. This suggests that the chosen base models (LLaMA-2-7B-chat, Qwen2-7B-instruct) have weak innate long-context understanding (also widely known as the "lost-in-the-middle" problem). Therefore, I want to see whether COMI can still provides benefits when applied to current models (e.g. Qwen3-4B

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques