A Layer-wise Analysis of Supervised Fine-Tuning

Qinghua Zhao; Xueling Gong; Xinyu Chen; Zhongfeng Kang; Xinlu Li

arXiv:2604.11838·cs.LG·April 15, 2026

A Layer-wise Analysis of Supervised Fine-Tuning

Qinghua Zhao, Xueling Gong, Xinyu Chen, Zhongfeng Kang, Xinlu Li

PDF

1 Repo

TL;DR

This paper analyzes how supervised fine-tuning affects model layers, revealing that middle layers are stable while final layers are sensitive, leading to a new efficient tuning method that improves alignment with less parameters.

Contribution

It introduces Mid-Block Efficient Tuning, a layer-wise approach that selectively updates critical intermediate layers, outperforming standard methods like LoRA.

Findings

01

Middle layers are stable during fine-tuning.

02

Final layers show high sensitivity to updates.

03

Proposed method improves GSM8K performance by up to 10.2%.

Abstract

While critical for alignment, Supervised Fine-Tuning (SFT) incurs the risk of catastrophic forgetting, yet the layer-wise emergence of instruction-following capabilities remains elusive. We investigate this mechanism via a comprehensive analysis utilizing information-theoretic, geometric, and optimization metrics across model scales (1B-32B). Our experiments reveal a distinct depth-dependent pattern: middle layers (20\%-80\%) are stable, whereas final layers exhibit high sensitivity. Leveraging this insight, we propose Mid-Block Efficient Tuning, which selectively updates these critical intermediate layers. Empirically, our method outperforms standard LoRA up to 10.2\% on GSM8K (OLMo2-7B) with reduced parameter overhead, demonstrating that effective alignment is architecturally localized rather than distributed. The code is publicly available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/base_sft
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.