Implicit In-context Learning
Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao, Wang, Dimitris N. Metaxas

TL;DR
Implicit In-context Learning (I2CL) reduces the computational cost of in-context learning by generating a condensed context vector and injecting it during inference, achieving few-shot performance at zero-shot inference cost.
Contribution
I2CL introduces a novel method that condenses demonstration examples into a context vector and performs inference via intervention, significantly lowering inference costs while maintaining performance.
Findings
I2CL achieves few-shot performance at zero-shot inference cost.
It demonstrates robustness to demonstration example variations.
Enhances task similarity detection and transfer learning.
Abstract
In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks…
Peer Reviews
Decision·ICLR 2025 Poster
1. I2CL presents a unique approach to reducing ICL’s computational load, which is an important advancement, especially for applications with resource constraints. 2. The concept of "task-ids" is a novel contribution that could facilitate transfer learning.
1. I2CL’s dependence on accessing intermediate activations may limit its use in closed or black-box models where such access is restricted. Addressing this limitation or proposing adaptations for such environments would broaden its utility. 2. Expand the evaluation to include more diverse tasks, such as open-ended generation or multi-step reasoning. Demonstrating I2CL’s efficacy in these tasks would make the paper’s findings more universally applicable.
- Proposes a method for performing in-context learning at the cost of zero-shot learning and their method (I2CL) performs comparable to in-context learning. - The idea of performing activation merging is an interesting direction to make model learn new skills at a very negligible cost. - I2CL scales very well with adding more samples in-contrast to in-context learning. - The paper provides comparison of their method with diverse methods such as task vector, label anchors etc to show improvements
- Learning of weights to infuse demonstration samples might require some compute and hence comparing this method to few-shot parameter efficient fine-tuning method would also be helpful. As there are two computational cost associated here one where you extract out the context vector and other where you train the coefficients to find optimal coefficients to combine samples. - Caching large context vector (for larger models) might not be feasible which might make it harder to apply this method to
1. This paper is clear and eazy to follow. The plot and table are eazy to understand 2. The proposed method is simple, and it achieves improvement on the inference speed and memory cost, compared to vallina ICL 3. The proposed algorithm is less sensitive to the order of demonstration examples. 4. The experiment part is comprehensive. The method is compared to various baselines. The authors also studied careful ablation studies to understand the method. 5. The observation that the calibrated li
1. The major claim of this paper is that I2CL achieves ICL performance at zero-shot cost in terms of both memory usage and inference speed, which raises my concern on the fairness of the comparison. Does the optimization of noisy self-calibration (Sec 2.4) included in the comparision? The calibration process requires inference of LLMs on each demonstration examples multiple times and optimize the linear coefficient with Adam. In comparison, ICL only requires forward pass of LLM on one-time. In t
Code & Models
Videos
Taxonomy
TopicsInnovative Teaching and Learning Methods
