LoFiT: Localized Fine-tuning on LLM Representations

Fangcong Yin; Xi Ye; Greg Durrett

arXiv:2406.01563·cs.CL·November 1, 2024·2 cites

LoFiT: Localized Fine-tuning on LLM Representations

Fangcong Yin, Xi Ye, Greg Durrett

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

LoFiT introduces a localized fine-tuning framework that identifies key attention heads in LLMs and trains offset vectors for task adaptation, achieving comparable results to other fine-tuning methods with significantly fewer parameter modifications.

Contribution

This work presents LoFiT, a novel approach that localizes fine-tuning to a sparse set of attention heads, improving efficiency and effectiveness over existing representation intervention techniques.

Findings

01

LoFiT localizes to 3%-10% of attention heads.

02

LoFiT outperforms representation intervention methods in truthfulness and reasoning tasks.

03

LoFiT matches the performance of LoRA while modifying 20x-200x fewer parameters.

Abstract

Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. We introduce a framework called Localized Fine-Tuning on LLM Representations (LoFiT), which identifies a subset of attention heads that are most important for learning a specific task, then trains offset vectors to add to the model's hidden representations at those selected heads. LoFiT localizes to a sparse set of heads (3%-10%) and learns the offset vectors from limited training data, comparable to the settings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fc2869/lo-fit
pytorchOfficial

Models

Videos

LoFiT: Localized Fine-tuning on LLM Representations· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Neural Networks and Applications

MethodsSparse Evolutionary Training