Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation
Xizi Chen, Jingyang Zhu, Jingbo Jiang, Chi-Ying Tsui

TL;DR
This paper introduces a novel weight permutation scheme combined with fine-grained pruning to significantly enhance CNN compression, leading to improved hardware efficiency and reduced energy consumption.
Contribution
It proposes a new permutation-based compression method that exploits fine-grained sparsity, achieving higher compression rates and better hardware utilization than existing approaches.
Findings
Matrix compression rate improved from 5.88x to 14.13x
Throughput increased by 2.75 times
Energy efficiency improved by 1.86 times
Abstract
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for implementation in regular architectures but tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a model compression method based on a novel weight permutation scheme to fully exploit the fine-grained weight sparsity in the hardware design. Through permutation, the optimal arrangement of the weight matrix is obtained, and the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Two pruning granularities are explored. In addition to the unstructured weight pruning, we also propose a more fine-grained subword-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPolydiacetylene-based materials and applications · Plant Molecular Biology Research
MethodsPruning
