TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
Xiang Meng, Mehdi Makni, Rahul Mazumder

TL;DR
This paper presents a scalable, efficient algorithm for generating transposable N:M sparse masks in neural networks, enabling better hardware acceleration and compression without sacrificing model performance.
Contribution
We introduce a novel, scalable solver for transposable N:M masks using optimal transport, enabling application to billion-parameter models and arbitrary N:M ratios.
Findings
Achieves up to 100x speedup over existing methods.
Maintains model performance close to dense models with 16:32 sparsity.
Outperforms standard 2:4 sparse models in experiments.
Abstract
Network pruning reduces the computational requirements of large neural networks, with N:M sparsity -- retaining only N out of every M consecutive weights -- offering a compelling balance between compressed model quality and hardware acceleration. However, N:M sparsity only accelerates forward-pass computations, as N:M patterns are not preserved during matrix transposition, limiting efficiency during training where both passes are computationally intensive. While transposable N:M sparsity has been proposed to address this limitation, existing methods for finding transposable N:M sparse masks either fail to scale to large models or are restricted to M=4 which results in suboptimal compression-accuracy trade-off. We introduce an efficient solver for transposable N:M masks that scales to billion-parameter models. We formulate mask generation as optimal transport problems and solve through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVehicle License Plate Recognition · DNA and Biological Computing · graph theory and CDMA systems
MethodsEntropy Regularization · Pruning
