Exact Conversion of In-Context Learning to Model Weights in   Linearized-Attention Transformers

Brian K Chen; Tianyang Hu; Hui Jin; Hwee Kuan Lee; Kenji Kawaguchi

arXiv:2406.02847·cs.LG·June 7, 2024

Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi

PDF

Open Access

TL;DR

This paper presents a method to exactly convert in-context learning prompts into permanent model weights in linearized transformers, enabling interpretable and efficient ICL integration without expensive retraining.

Contribution

It introduces ICLCA, an algorithm for exact conversion of ICL prompts into model weights in linearized transformers, and extends the approach to approximate conversion in regular transformers.

Findings

01

Exact conversion of ICL prompts into model weights is feasible in linearized transformers.

02

The method improves interpretability and efficiency of in-context learning.

03

Approximate conversion still provides valuable contextual information in non-linear transformers.

Abstract

In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Memory and Neural Computing · EEG and Brain-Computer Interfaces

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Layer Normalization · Weight Decay · Linear Warmup With Cosine Annealing · Attention Dropout · Linear Layer · Byte Pair Encoding · Adam · Attention Is All You Need