Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
Chungpa Lee, Jy-yong Sohn, Kangwook Lee

TL;DR
This paper provides a theoretical analysis of how fine-tuning linear attention models affects in-context learning, showing that selective fine-tuning preserves performance and that auxiliary losses can improve task-specific in-context learning.
Contribution
It offers a theoretical framework explaining the impact of fine-tuning on in-context learning in linear attention models and suggests strategies to preserve or enhance it.
Findings
Fine-tuning all attention parameters can harm in-context learning.
Restricting updates to the value matrix improves zero-shot performance.
Auxiliary few-shot loss enhances in-context learning on target tasks.
Abstract
Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
