Packing Sparse Convolutional Neural Networks for Efficient Systolic   Array Implementations: Column Combining Under Joint Optimization

H. T. Kung; Bradley McDanel; Sai Qian Zhang

arXiv:1811.04770·cs.LG·November 13, 2018·22 cites

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

H. T. Kung, Bradley McDanel, Sai Qian Zhang

PDF

Open Access

TL;DR

This paper introduces a column combining method for packing sparse CNNs to improve systolic array efficiency, achieving significant gains in utilization, energy efficiency, and latency while maintaining accuracy with minimal retraining data.

Contribution

It proposes a novel column combining technique that increases systolic array utilization and efficiency, with a joint optimization of pruning and retraining for high accuracy.

Findings

01

Approximately 4x increase in array utilization.

02

3x improvement in energy efficiency.

03

12x reduction in inference latency.

Abstract

This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retrain the remaining weights to preserve high accuracy. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10\% for CIFAR-10). We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Advanced Memory and Neural Computing