From Pruning to Grafting: Dynamic Knowledge Redistribution via Learnable Layer Fusion
Zehua Pei, Hui-Ling Zhen, Xianzhi Yu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

TL;DR
FuseGPT introduces a novel dynamic knowledge grafting method for GPT model pruning, significantly improving efficiency and accuracy by adaptively merging pruned blocks rather than removing them.
Contribution
The paper proposes FuseGPT, a new pruning paradigm using learnable layer fusion and dynamic importance evaluation, surpassing traditional pruning methods in GPT compression.
Findings
Achieves lower perplexity at 25% sparsity compared to prior methods.
Improves zero-shot reasoning by up to 4.5 points.
Delivers 1.33× inference speedup with 25% memory reduction.
Abstract
Structured pruning of Generative Pre-trained Transformers (GPTs) offers a promising path to efficiency but often suffers from irreversible performance degradation due to the discarding of transformer blocks. In this paper, we introduce FuseGPT, a compression paradigm that reframes structured pruning as iterative knowledge grafting rather than simple removal. Motivated by the observation that linear block merging fails to capture non-linear feature disparities and that block importance fluctuates dynamically during pruning, FuseGPT employs a dual-strategy pipeline. First, we propose Macro Influence (MI), a dynamic fusion-aware metric that continuously re-evaluates block redundancy as the network topology evolves. Second, instead of rigid parameter averaging, we introduce a learnable low-rank fusion mechanism that adaptively grafts the knowledge of pruned blocks onto surviving layers via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Time Series Analysis and Forecasting · Neural Networks and Applications
MethodsPruning
