Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity   Tracking

Nikhil Prakash; Tamar Rott Shaham; Tal Haklay; Yonatan Belinkov; David; Bau

arXiv:2402.14811·cs.CL·February 23, 2024·3 cites

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David, Bau

PDF

Open Access 1 Models 1 Video

TL;DR

This study investigates how fine-tuning impacts the internal mechanisms of language models, specifically focusing on entity tracking, revealing that fine-tuning improves existing circuits rather than creating new ones.

Contribution

The paper demonstrates that fine-tuning enhances existing entity tracking circuits in language models without fundamentally changing their mechanisms.

Findings

01

Entity tracking circuits are largely unchanged by fine-tuning.

02

Fine-tuning improves the model's ability to handle positional information.

03

Enhanced performance is due to better handling of augmented positional data.

Abstract

Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nikhil07prakash/float-7b
model· 10 dl· ♡ 3
10 dl♡ 3

Videos

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking· slideslive

Taxonomy

TopicsData Stream Mining Techniques

MethodsActivation Patching