Mechanisms are Transferable: Data-Efficient Low-Resource Adaptation via Circuit-Targeted Supervised Fine-Tuning
Khumaisa Nur'aini, Ayu Purwarianti, Alham Fikri Aji, Derry Wijaya

TL;DR
This paper introduces CT-SFT, a method for low-resource language adaptation of large language models that selectively fine-tunes task-relevant attention heads, improving cross-lingual accuracy and reducing catastrophic forgetting.
Contribution
The paper proposes Circuit-Targeted Supervised Fine-Tuning (CT-SFT), a novel head-level parameter updating method that enhances low-resource language adaptation while preserving source language knowledge.
Findings
CT-SFT outperforms full fine-tuning in cross-lingual tasks.
Selective head updating reduces catastrophic forgetting.
Trade-off observed between circuit head editing and low-relevance head preservation.
Abstract
Adapting LLMs to low-resource languages is difficult: labeled data is scarce, full-model fine-tuning is unstable, and continued cross-lingual tuning can cause catastrophic forgetting. We propose Circuit-Targeted Supervised Fine-Tuning (CT-SFT): a counterfactual-free adaptation of CD-T (Contextual Decomposition Transformer) that uses a label-balanced mean baseline and task-directional relevance scoring to identify a sparse set of task-relevant attention heads in a proxy-language checkpoint, then transfer learns to a target language by updating only those heads (plus LayerNorm) via head-level gradient masking. Across NusaX-Senti and XNLI, CT-SFT improves cross-lingual accuracy over continued full fine-tuning while updating only a small subset of model parameters. We find an editing-preserving trade-off: harder transfers favor editing circuit heads, while easier transfers often favor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
