Network Pruning for Low-Rank Binary Indexing
Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Parichay Kapoor, Gu-Yeon, Wei

TL;DR
This paper introduces a novel network pruning method that creates low-rank binary index matrices, significantly reducing memory and improving compression for sparse DNN models while maintaining performance.
Contribution
It proposes a new pruning technique that decomposes sparse index matrices into binary factors and a tile-based factorization to enhance compression and reduce memory usage.
Findings
Achieves higher compression ratios than previous sparse formats.
Maintains pruning effectiveness with fewer indexes.
Reduces memory footprint for sparse neural network models.
Abstract
Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). Computations using sparse matrices obtained by pruning parameters, however, exhibit vastly different parallelism depending on the index representation scheme. As a result, fine-grained pruning has not gained much attention due to its irregular index form leading to large memory footprint and low parallelism for convolutions and matrix multiplications. In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data while decompressing index data is performed by simple binary matrix multiplication. This proposed compression method finds a particular fine-grained pruning mask that can be decomposed into two binary matrices. We also propose a tile-based factorization technique that not only lowers memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Graph Neural Networks · Tensor decomposition and applications
MethodsPruning
