Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Hanting Chen, Jiarui Qin, Jialong Guo, Tao Yuan, Yichun Yin, Huiling Zhen, Yasheng Wang, Jinpeng Li, Xiaojun Meng, Meng Zhang, Rongju Ruan, Zheyuan Bai, Yehui Tang, Can Chen, Xinghao Chen, Fisher Yu, Ruiming Tang, Yunhe Wang (and Other Contributors)

TL;DR
Pangu Light introduces a novel weight re-initialization framework for structured pruning of LLMs, significantly improving their efficiency and accuracy trade-offs by addressing performance degradation issues during aggressive model compression.
Contribution
The paper presents new re-initialization techniques like CLAP and SLNP that enhance pruning effectiveness and model performance, especially on specialized hardware like Ascend NPUs.
Findings
Pangu Light outperforms baseline pruning methods in accuracy and efficiency.
On Ascend NPUs, Pangu Light-32B achieves higher scores and throughput than Qwen3-32B.
Re-initialization techniques mitigate performance drops during aggressive pruning.
Abstract
Large Language Models (LLMs) deliver state-of-the-art capabilities across numerous tasks, but their immense size and inference costs pose significant computational challenges for practical deployment. While structured pruning offers a promising avenue for model compression, existing methods often struggle with the detrimental effects of aggressive, simultaneous width and depth reductions, leading to substantial performance degradation. This paper argues that a critical, often overlooked, aspect in making such aggressive joint pruning viable is the strategic re-initialization and adjustment of remaining weights to improve the model post-pruning training accuracies. We introduce Pangu Light, a framework for LLM acceleration centered around structured pruning coupled with novel weight re-initialization techniques designed to address this ``missing piece''. Our framework systematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Modular Robots and Swarm Intelligence
MethodsSoftmax · Attention Is All You Need · Pruning · Root Mean Square Layer Normalization
