InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

Shuaiyi Li; Zhisong Zhang; Yang Deng; Chenlong Deng; Tianqing Fang; Hongming Zhang; Haitao Mi; Dong Yu; Wai Lam

arXiv:2505.22156·cs.CL·January 8, 2026

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

Shuaiyi Li, Zhisong Zhang, Yang Deng, Chenlong Deng, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Wai Lam

PDF

Open Access 4 Reviews

TL;DR

InComeS introduces a novel framework that combines compression and selection mechanisms to improve large language models' efficiency and effectiveness in model editing, especially when handling multiple edits beyond context window limitations.

Contribution

The paper proposes InComeS, a flexible approach that compresses editing contexts into gist tokens and dynamically selects relevant information, enhancing model editing capabilities.

Findings

01

Outperforms existing methods on diverse benchmarks

02

Handles multiple edits efficiently beyond context window limits

03

Improves both effectiveness and computational efficiency

Abstract

Although existing model editing methods perform well in recalling exact edit facts, they often struggle in complex scenarios that require deeper semantic understanding rather than mere knowledge regurgitation. Leveraging the strong contextual reasoning abilities of large language models (LLMs), in-context learning (ICL) becomes a promising editing method by comprehending edit information through context encoding. However, this method is constrained by the limited context window of LLMs, leading to degraded performance and efficiency as the number of edits increases. To overcome this limitation, we propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts through explicit compression and selection mechanisms. Specifically, InComeS compresses each editing context into the key-value (KV) cache of a special gist token, enabling efficient handling of…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper proposes the use of KV cache to improve efficiency and proposes corresponding training algorithms to improve the performance, and cross-attention modules are added to dynamically select the most relevant information gist. 2. The paper conducts a large number of experiments and thorough analysis.

Weaknesses

1. There are only two models used in the paper, and the latest model (e.g. Qwen3-8B) is not used. If results about this model are reported, it will be more convincing. If the time is not sufficient, the author could consider only adding a small number of baselines for comparison. 2. The performance of the model in Table 3 is not competitive. 3. The paper uses the method of compressing content into key-value cache of gist tokens to achieve this. However, for different models, different gist tok

Reviewer 02Rating 6Confidence 3

Strengths

1. The integration of gist-based context compression with a learnable, dynamic selection mechanism is a novel combination that directly addresses the bottlenecks of ICL for batch editing. 2. By compressing edits in parallel and using a lightweight selection mechanism, it offers substantial speedups over ICL. This makes the approach practical for real-world applications. 3. This paper is clearly written, well organized, and generally easy to understand.

Weaknesses

1. The method requires a continued pre-training phase to teach the model the compression and selection mechanisms. 2. The results in Table 1 show that the improvement of InComeS on Llama-3.2-1B is much greater than that on Qwen2.5-7B. This may indicate that the effectiveness of the method diminishes as the model scale increases.

Reviewer 03Rating 4Confidence 4

Strengths

(1) Compressing edits into re-usable gist KV caches and adding token-level cross-attention to select among them is clean; Zero-gist, serving as a “no-selection” option, reduces interference from the edit context on irrelevant tokens (see ablations) and complements the locality metric. (2) Evaluates the effectiveness of the method across multiple scenarios, including multi-hop edits (MQuAKE), natural-language edits (DUNE), and ripple/portability settings (WikiDataCounterfact, ZsRE-extended).

Weaknesses

(1) The paper does not include comparisons with recent strong editors such as memory based RECIPE[1] and ICL retriever based DR-IKE[2]. Without these, the empirical claims lack persuasiveness regarding true advances over contemporary methods. [1]Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning [2]Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization (2) Despite criticizing ICL’s limitations, results show InComeS often performs on par wi

Reviewer 04Rating 4Confidence 4

Strengths

The paper clearly identifies a significant and practical challenge in model editing and propose InComeS a flexible framework that enhances LLMs’ ability to process editing contexts through explicit compression and selection mechanism. The Gist token used in Editing is very interesting. And editing the attention module is also novel.

Weaknesses

* How can the GIST token be effectively trained? Furthermore, once trained, what metrics should be used to evaluate the generalization capability of the GIST token? * The training process requires approximately 11 hours for Llama-3.2-1B and 35 hours for Qwen2.5-7B. Considering the performance gains achieved, how does the efficiency of this approach compare to other model editing techniques, such as In-Context Learning (ICL)? * As your experiments indicate, simple fine-tuning (FT) can yie

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Model-Driven Software Engineering Techniques · Digital Rights Management and Security