Joint Localization and Activation Editing for Low-Resource Fine-Tuning
Wen Lai, Alexander Fraser, Ivan Titov

TL;DR
JoLA is a novel method that jointly learns which model components to edit and how to edit them, significantly improving low-resource fine-tuning of large language models across various tasks.
Contribution
The paper introduces JoLA, a joint learning approach for localization and activation editing, enhancing stability and performance in low-resource scenarios.
Findings
JoLA outperforms existing methods on three benchmarks.
It effectively identifies relevant model modules for editing.
JoLA improves task performance in low-data settings.
Abstract
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence · Parallel Computing and Optimization Techniques · Advancements in Photolithography Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
