In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan, Zhong, Yaodong Yang, Zilong Zheng

TL;DR
This paper presents Consistent In-Context Editing (ICE), a novel method enabling language models to efficiently incorporate new knowledge through in-context learning, improving robustness and avoiding overfitting without extensive retraining.
Contribution
ICE introduces a simple optimization framework that aligns model output distributions with and without additional context, enhancing knowledge editing capabilities.
Findings
ICE improves accuracy of knowledge editing.
ICE maintains linguistic quality and model integrity.
ICE demonstrates robustness across various editing scenarios.
Abstract
In scenarios where language models must incorporate new information efficiently without extensive retraining, traditional fine-tuning methods are prone to overfitting, degraded generalization, and unnatural language generation. To address these limitations, we introduce Consistent In-Context Editing (ICE), a novel approach leveraging the model's in-context learning capability to optimize toward a contextual distribution rather than a one-hot target. ICE introduces a simple yet effective optimization framework for the model to internalize new knowledge by aligning its output distributions with and without additional context. This method enhances the robustness and effectiveness of gradient-based tuning methods, preventing overfitting and preserving the model's integrity. We analyze ICE across four critical aspects of knowledge editing: accuracy, locality, generalization, and linguistic…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper is clear, well-motivated and the idea is novel as far as I know. - Compared to other baselines, this method is the only one capable of effectively editing knowledge continually.
- The pipeline is quite heavy, relying on sampling at every optimization step and GPT-4 for augmented contexts.
1. The method is novel for knowledge editing by identifying an issue with prior approaches for targeted knowledge editing. While there has been prior work on fine-tuning and naive work on in-context (prompt-based) knowledge editing, the combination of distilling the in-context editing directly into the parameters has not been done. 2. The empirical results, while not perfect on all metrics and datasets, show promise across the baseline methods presented and on the standard metrics and perplexit
1. The method is similar to knowledge/context distillation or gisting, and so a connection should be drawn there. Still, applying this method appears novel for knowledge editing. However the lack of references to KD/gisting makes it hard to place how related (or not) this idea is to that line of work. [Snell et al., 2022](https://arxiv.org/abs/2209.15189) - Context Distillation [Mu et al., 2023](https://arxiv.org/abs/2304.08467) - Gisting 2. The paper advocates for conditioning on “context” t
The paper proposes an interesting methods that is likely to be useful for future applications. The paper does a good job at demonstrating the usefulness of the method.
The paper only studies one base model, it is not clear how this generalizes to other model. In particular, I suspect that the size of the model and their base in-context capabilities might play an important role in the success of the method.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Data Stream Mining Techniques
