TL;DR
This paper analyzes how supervised fine-tuning affects model layers, revealing that middle layers are stable while final layers are sensitive, leading to a new efficient tuning method that improves alignment with less parameters.
Contribution
It introduces Mid-Block Efficient Tuning, a layer-wise approach that selectively updates critical intermediate layers, outperforming standard methods like LoRA.
Findings
Middle layers are stable during fine-tuning.
Final layers show high sensitivity to updates.
Proposed method improves GSM8K performance by up to 10.2%.
Abstract
While critical for alignment, Supervised Fine-Tuning (SFT) incurs the risk of catastrophic forgetting, yet the layer-wise emergence of instruction-following capabilities remains elusive. We investigate this mechanism via a comprehensive analysis utilizing information-theoretic, geometric, and optimization metrics across model scales (1B-32B). Our experiments reveal a distinct depth-dependent pattern: middle layers (20\%-80\%) are stable, whereas final layers exhibit high sensitivity. Leveraging this insight, we propose Mid-Block Efficient Tuning, which selectively updates these critical intermediate layers. Empirically, our method outperforms standard LoRA up to 10.2\% on GSM8K (OLMo2-7B) with reduced parameter overhead, demonstrating that effective alignment is architecturally localized rather than distributed. The code is publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
