Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design
Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang

TL;DR
This paper introduces a co-designed algorithm, architecture, and dataflow approach for efficient N:M sparse DNN training, achieving significant speedups and energy efficiency improvements on FPGA hardware.
Contribution
It proposes a novel bidirectional weight pruning method and a specialized FPGA accelerator supporting N:M sparsity, enhancing training efficiency without sacrificing accuracy.
Findings
Achieves 1.75x speedup over dense training on FPGA.
Improves training throughput by up to 25.22x.
Reduces energy consumption by up to 3.58x.
Abstract
Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
