Strategic Fusion Optimizes Transformer Compression
Md Shoaibur Rahman

TL;DR
This paper introduces a strategic fusion approach using random forests and knowledge distillation to optimize transformer model pruning, significantly improving efficiency and accuracy across multiple datasets.
Contribution
It proposes a novel fusion strategy combining multiple signals for transformer pruning, outperforming individual methods and enhancing model compression with knowledge distillation.
Findings
Random forest fusion outperforms individual strategies in most datasets.
Knowledge distillation improves accuracy and size ratio.
Fusion strategies achieve near-optimal performance and surpass original accuracy in several cases.
Abstract
This study investigates transformer model compression by systematically pruning its layers. We evaluated 14 pruning strategies across nine diverse datasets, including 12 strategies based on different signals obtained from layer activations, mutual information, gradients, weights, and attention. To address the limitations of single-signal strategies, we introduced two fusion strategies, linear regression and random forest, which combine individual strategies (i.e., strategic fusion), for more informed pruning decisions. Additionally, we applied knowledge distillation to mitigate any accuracy loss during layer pruning. Our results reveal that random forest strategic fusion outperforms individual strategies in seven out of nine datasets and achieves near-optimal performance in the other two. The distilled random forest surpasses the original accuracy in six datasets and mitigates accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation
MethodsLinear Regression · Knowledge Distillation · Pruning
