TL;DR
This paper introduces a comprehensive taxonomy and benchmarking framework for precision-scalable MAC arrays in DNN accelerators, enabling fair comparison and guiding design choices for energy-efficient AI hardware.
Contribution
It proposes a new dataflow representation, develops a systematic taxonomy, and provides a large-scale benchmark of 72 architectures across various configurations.
Findings
Energy and area efficiency insights for different PSMA designs
Design guidelines for scalable precision in DNN accelerators
Benchmarking results across 28nm technology and multiple configurations
Abstract
Reduced-precision and variable-precision multiply-accumulate (MAC) operations provide opportunities to significantly improve energy efficiency and throughput of DNN accelerators with no/limited algorithmic performance loss, paving a way towards deploying AI applications on resource-constraint edge devices. Accordingly, various precision-scalable MAC array (PSMA) architectures were proposed recently. However, it is difficult to make a fair comparison between those alternatives, as each proposed PSMA is demonstrated in different systems and technologies. This work aims to provide a clear view of the design space of PSMA and offer insights for selecting the optimal architectures based on designers' needs. First, we introduce a precision-enhanced for-loop representation for DNN dataflows. Next, we use this new representation towards a comprehensive PSMA taxonomy, capable of systematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
