Understanding Layer Significance in LLM Alignment
Guangyuan Shi, Zexin Lu, Xiaoyu Dong, Wenlong Zhang, Xuanyu Zhang,, Yujie Feng, Xiao-Ming Wu

TL;DR
This paper introduces ILA, a method to identify critical layers in LLMs during alignment, revealing that a small subset of layers are most influential, which improves fine-tuning efficiency and model performance.
Contribution
The paper presents a novel approach, ILA, to determine layer importance in LLM alignment, showing that key layers are consistent across datasets and that focusing on them enhances fine-tuning.
Findings
Important layers overlap by nearly 90% across datasets
Freezing non-essential layers improves performance
Tuning critical layers boosts fine-tuning efficiency
Abstract
Aligning large language models (LLMs) through supervised fine-tuning is essential for tailoring them to specific applications. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To uncover how alignment affects model behavior at a granular level, we propose identifying which layers within LLMs are most critical to the alignment process. Our approach, named ILA, involves learning a binary mask for the parameter changes in each layer during alignment, as an indicator of layer significance. Experimental results reveal that, despite substantial differences in alignment datasets, the important layers of a model identified by ILA exhibit nearly 90\% overlap, highlighting fundamental patterns in LLM alignment. The results also indicate that…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The proposed method can offer improvments for fine-tuning LLMs while saving memory usage. 2. Extensive experiments are provided in this paper.
1. **Limited improvements**. According to the results presented, ILA does not bring much improvments. The performances of the models using ILA and the ones without ILA are close. For example, in Table 4, only 0.12% increase from "Full Finetune" to "Full Finetune w/ILA" with the LLAMA 2-7B model. 2. **Potential overlap with existing PEFT methods.** The authors may need to clarify why we need an additional PEFT method and what values freezing unimportant layers can bring compared with the existing
- The work describes a potentially useful method
1. Lack of clarity: the method is not well explained, and important technical details are missing 2. No comparison to similar methods 3. Potentially unsupported inferences from experimental results 4. Lack of mathematical rigor I elaborate on these points below. # Lack of clarity I struggled to understand how the method works. Terms such as "important," "unimportant," "significant," and "insignificant" layers are not defined. Algorithm 1 is not explained nor connected to the formulas in Secti
This paper is well-written and easy to understand. The proposed ILA method is also intuitive and has achieved excellent results.
1. I am somewhat concerned about whether the contributions of this paper are sufficient, as [1][2][3][4] indicate that adjusting certain parameters/layers during the fine-tuning process can indeed lead to effective improvements. Additionally, [5] shows that there is significant redundancy in the parameters during the SFT process. I believe that existing work already highlights the necessity of adjusting certain parameters during the post-training phase. The authors should emphasize the contribut
1) The authors evaluate their method on several modern benchmarks, including MT-Bench 2) Interesting results from the perspective of layer significance. 3) The observed transferability across datasets is a strong indicator of the method’s robustness. I particularly appreciated the inclusion of out-of-distribution testing within the ablation study, along with the use of Jaccard similarities for comparison. 4) The paper benefits from a well-organized structure and visually appealing presentation
Major Weaknesses: 1) Lack of technical details on optimal layers finding. 1.1)This process can require considerable training time before stabilization occurs, especially for larger models. Moreover, it’s difficult to guarantee that stabilization will ever fully occur, given the non-convex nature of the optimization problem. 1.2) Additionally, while the paper reports efficiency improvements, the measurements don’t account for the compute required for pre-training and layer selection. As a resu
On one hand, the authors provide detailed motivation in the introduction explaining why freezing different layers during training is necessary, which can help reduce catastrophic forgetting to some extent. On the other hand, the paper offers comprehensive empirical evidence showing their proposed method achieves generally improved performance across LoRA, AdaLoRA, QLoRA and full fine-tuning.
Indeed, this paper's novelty is limited. The core motivation and main method of freezing certain layers to avoid overfitting was proposed three years ago in paper [1], which even provided finer-grained control over the degree of parameter freezing. In my view, the authors merely validated this approach on Alignment tasks (just one type of fine-tuning task). While I acknowledge the technical implementations differ, given the similar research motivations and the limited application scope of this m
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Dispute Resolution and Class Actions
