Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Ameen Ali, Lior Wolf, Ivan Titov

TL;DR
This paper introduces a neuron pruning technique to reduce copying bias in large language models during in-context learning, improving their ability to learn underlying patterns across diverse tasks.
Contribution
The authors propose a novel neuron pruning method using Integrated Gradients to mitigate copying bias in LLMs without architectural modifications.
Findings
Pruning neurons identified as copying-prioritized improves ICL performance.
Method is effective across different LLM architectures.
Pruning enhances task recognition quality in models.
Abstract
Large language models (LLMs) have demonstrated impressive few-shot in-context learning (ICL) abilities. Still, we show that they are sometimes prone to a `copying bias', where they copy answers from provided examples instead of learning the underlying patterns. In this work, we propose a novel and simple method to mitigate such copying bias. First, we create a synthetic task and use the Integrated Gradients method to identify neurons that prioritize copying over generalization. We demonstrate that pruning these neurons consistently improves performance across a diverse set of ICL tasks. We also show that our method is applicable across various LLM architectures, including Transformers and State-Space Models, without requiring modifications. In our analysis, we adopt a task-recognition perspective on ICL and examine task vectors (Hendel et al., 2023) induced by the model. We find that…
Peer Reviews
Decision·Submitted to ICLR 2025
This paper successfully implemented neural pruning into tackling the copy bias in ICL. The proposed IG method with pruning strategy are relatively reasonable to solve the copy bias problem. This is a significant contribution as previous work primarily focused on prompt engineering and calibration methods without directly addressing the internal model dynamics causing copying errors. Also, the proposed method is versatile, as it can be applied across LLM architectures, including both Transformer
The primary weakness of this paper is the lack of a comprehensive evaluation of the model’s capabilities after pruning. While the method effectively mitigates copy bias, this paper presents more of a case study rather than demonstrating its real-world applicability. Pruning techniques generally have an impact on the overall performance of the model, and as such, it is critical for the authors to provide results showing how the pruned model performs on downstream tasks. Unfortunately, this paper
1. The author uses illustration to help the reader better understand how the proposed method works. 2. The author considers different ICL tasks for validation.
1. Lacks some implementation detail: in Section 3.3, what is the size of n? The author assumes that $\hat{y}\notin S_p$ and therefore if n is large, there will be many repetitions in $S_p$ since labels in $S_p$ are the count of vowels, which are integers. Also, does the author explicitly held out an integer so that when constructing $S_p$, the words cannot have the number of vowels equal to this held-out integer? 2. Universality of copying neurons detection: This paper only studies how to detect
1. This paper demonstrates that mechanistic interpretability methods like Integrated Gradients (IG), typically used for understanding model behaviors, can have practical implications such as improving in-context learning (ICL). By identifying and pruning neurons responsible for copying behavior, the paper presents an innovative approach to enhancing ICL performance. 2. The authors conduct experiments across a wide range of tasks and model architectures, demonstrating consistent performance impr
1. **Similarity of Tasks and Limited Generalizability**: The 18 tasks studied in this paper share notable similarities with vowel counting, meaning that the copying neurons identified through Integrated Gradients primarily apply to a set of very related and rather synthetic tasks. While the authors also attempt to show improvements on non-synthetic tasks, I am not sure how to interpret "copying error" on datasets like SST2 and SST5, which are multiple-choice tasks with a limited set of choices (
Videos
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Pruning
