Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs

Hanting Chen; Jiarui Qin; Jialong Guo; Tao Yuan; Yichun Yin; Huiling Zhen; Yasheng Wang; Jinpeng Li; Xiaojun Meng; Meng Zhang; Rongju Ruan; Zheyuan Bai; Yehui Tang; Can Chen; Xinghao Chen; Fisher Yu; Ruiming Tang; Yunhe Wang (and Other Contributors)

arXiv:2505.20155·cs.CL·May 27, 2025

Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs

Hanting Chen, Jiarui Qin, Jialong Guo, Tao Yuan, Yichun Yin, Huiling Zhen, Yasheng Wang, Jinpeng Li, Xiaojun Meng, Meng Zhang, Rongju Ruan, Zheyuan Bai, Yehui Tang, Can Chen, Xinghao Chen, Fisher Yu, Ruiming Tang, Yunhe Wang (and Other Contributors)

PDF

Open Access

TL;DR

Pangu Light introduces a novel weight re-initialization framework for structured pruning of LLMs, significantly improving their efficiency and accuracy trade-offs by addressing performance degradation issues during aggressive model compression.

Contribution

The paper presents new re-initialization techniques like CLAP and SLNP that enhance pruning effectiveness and model performance, especially on specialized hardware like Ascend NPUs.

Findings

01

Pangu Light outperforms baseline pruning methods in accuracy and efficiency.

02

On Ascend NPUs, Pangu Light-32B achieves higher scores and throughput than Qwen3-32B.

03

Re-initialization techniques mitigate performance drops during aggressive pruning.

Abstract

Large Language Models (LLMs) deliver state-of-the-art capabilities across numerous tasks, but their immense size and inference costs pose significant computational challenges for practical deployment. While structured pruning offers a promising avenue for model compression, existing methods often struggle with the detrimental effects of aggressive, simultaneous width and depth reductions, leading to substantial performance degradation. This paper argues that a critical, often overlooked, aspect in making such aggressive joint pruning viable is the strategic re-initialization and adjustment of remaining weights to improve the model post-pruning training accuracies. We introduce Pangu Light, a framework for LLM acceleration centered around structured pruning coupled with novel weight re-initialization techniques designed to address this ``missing piece''. Our framework systematically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Surface Polishing Techniques · Modular Robots and Swarm Intelligence

MethodsSoftmax · Attention Is All You Need · Pruning · Root Mean Square Layer Normalization