Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For   Convolutional Neural Networks

Ashish Gondimalla; Sree Charan Gundabolu; T.N. Vijaykumar; and Mithuna; Thottethodi

arXiv:2104.08734·cs.AR·May 11, 2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Ashish Gondimalla, Sree Charan Gundabolu, T.N. Vijaykumar, and Mithuna, Thottethodi

PDF

Open Access

TL;DR

BARISTA is a novel large-scale sparse CNN accelerator that overcomes scalability barriers by reducing bandwidth, buffering, and load imbalance, achieving significant performance improvements over existing architectures.

Contribution

It introduces the first scalable architecture for sparse CNNs, addressing key issues of bandwidth, buffering, and load balancing at large scales.

Findings

01

Achieves 5.4x performance over dense architectures

02

Reduces on-chip bandwidth demand through request telescoping

03

Demonstrates effective load balancing and buffering strategies

Abstract

Convolutional neural networks (CNNs) are emerging as powerful tools for visual recognition. Recent architecture proposals for sparse CNNs exploit zeros in the feature maps and filters for performance and energy without losing accuracy. Sparse architectures that exploit two-sided sparsity in both feature maps and filters have been studied only at small scales (e.g., 1K multiply-accumulate(MAC) units). However, to realize their advantages in full, the sparse architectures have to be scaled up to levels of the dense architectures (e.g., 32K MACs in the TPU). Such scaling is challenging since achieving reuse through broadcasts incurs implicit barrier cost raises the inter-related issues of load imbalance, buffering, and on-chip bandwidth demand. SparTen, a previous scheme, addresses one aspect of load balancing but not other aspects, nor the other issues of buffering and bandwidth. To that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Stochastic Gradient Optimization Techniques