Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David, Bau

TL;DR
This study investigates how fine-tuning impacts the internal mechanisms of language models, specifically focusing on entity tracking, revealing that fine-tuning improves existing circuits rather than creating new ones.
Contribution
The paper demonstrates that fine-tuning enhances existing entity tracking circuits in language models without fundamentally changing their mechanisms.
Findings
Entity tracking circuits are largely unchanged by fine-tuning.
Fine-tuning improves the model's ability to handle positional information.
Enhanced performance is due to better handling of augmented positional data.
Abstract
Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Stream Mining Techniques
MethodsActivation Patching
