NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks
Mahmood Azhar Qureshi, Arslan Munir

TL;DR
NeuroMAX introduces a multi-threaded, log-based processing element core for CNN acceleration, achieving double the peak throughput per PE with minimal area increase, and employs a 2D weight broadcast dataflow for high hardware utilization.
Contribution
The paper presents a novel multi-threaded, log-based PE core and a 2D weight broadcast dataflow, significantly improving throughput and utilization in CNN accelerators.
Findings
200% increase in peak throughput per PE
6% area overhead compared to linear PE core
High hardware utilization across various CNNs
Abstract
Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100% hardware utilization to reach this ratio. In this paper, we introduce a high throughput, multi-threaded, log-based PE core. The designed core provides a 200% increase in peak throughput per PE count while only incurring a 6% increase in area overhead compared to a single, linear multiplier PE core with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning
MethodsConvolution
