Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions
Boyu Shi, Chang Liu, ChuanBao Gao, Xu Yang, Xin Geng

TL;DR
This paper investigates why layer pruning causes performance collapse in large language models by analyzing decision representation dynamics, revealing a critical transition phase sensitive to pruning.
Contribution
It introduces decision-based metrics and an iterative pruning method to explain the collapse mechanism, emphasizing the importance of the Silent Phase in model performance.
Findings
Pruning the Silent Phase causes immediate performance collapse.
A sharp decision transition separates Silent and Decisive phases.
Disrupting the Silent Phase prevents the critical decision transition.
Abstract
Layer pruning efficiently reduces Large Language Model (LLM) computational costs but often triggers sudden performance collapse. Existing representation-based analyses struggle to explain this mechanism. We propose studying pruning through decision representation. Focusing on multiple-choice tasks, we introduce two metrics, Decision Margin and Option Frequency, and an Iterative Pruning method to analyze layer-wise decision dynamics. Our findings reveal a sharp decision transition that partitions the network into two stages: a Silent Phase, where the model cannot yet predict the correct answer, and a Decisive Phase, where the correct prediction emerges. We also find that pruning the Decisive Phase has minimal impact, whereas pruning the Silent Phase triggers immediate performance collapse, highlighting its extreme sensitivity to structural changes. Therefore, we conclude that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
