Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Chungpa Lee; Jy-yong Sohn; Kangwook Lee

arXiv:2602.23197·cs.CL·February 27, 2026

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Chungpa Lee, Jy-yong Sohn, Kangwook Lee

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of how fine-tuning linear attention models affects in-context learning, showing that selective fine-tuning preserves performance and that auxiliary losses can improve task-specific in-context learning.

Contribution

It offers a theoretical framework explaining the impact of fine-tuning on in-context learning in linear attention models and suggests strategies to preserve or enhance it.

Findings

01

Fine-tuning all attention parameters can harm in-context learning.

02

Restricting updates to the value matrix improves zero-shot performance.

03

Auxiliary few-shot loss enhances in-context learning on target tasks.

Abstract

Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications