NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for   Convolutional Neural Networks

Mahmood Azhar Qureshi; Arslan Munir

arXiv:2007.09578·cs.AR·July 21, 2020

NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks

Mahmood Azhar Qureshi, Arslan Munir

PDF

Open Access

TL;DR

NeuroMAX introduces a multi-threaded, log-based processing element core for CNN acceleration, achieving double the peak throughput per PE with minimal area increase, and employs a 2D weight broadcast dataflow for high hardware utilization.

Contribution

The paper presents a novel multi-threaded, log-based PE core and a 2D weight broadcast dataflow, significantly improving throughput and utilization in CNN accelerators.

Findings

01

200% increase in peak throughput per PE

02

6% area overhead compared to linear PE core

03

High hardware utilization across various CNNs

Abstract

Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100% hardware utilization to reach this ratio. In this paper, we introduce a high throughput, multi-threaded, log-based PE core. The designed core provides a 200% increase in peak throughput per PE count while only incurring a 6% increase in area overhead compared to a single, linear multiplier PE core with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning

MethodsConvolution