tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for   Low-Precision Edge AI

Harideep Nair; Prabhu Vellaisamy; Albert Chen; Joseph Finn; Anna Li,; Manav Trivedi; and John Paul Shen

arXiv:2412.17966·cs.AR·December 25, 2024

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI

Harideep Nair, Prabhu Vellaisamy, Albert Chen, Joseph Finn, Anna Li,, Manav Trivedi, and John Paul Shen

PDF

TL;DR

tuGEMM introduces a novel temporal-coding unary GEMM architecture that achieves high area and power efficiency, making it suitable for low-power edge AI applications with exact computation capabilities.

Contribution

The paper presents a new temporal-coding based unary GEMM architecture with two variants, demonstrating significant efficiency improvements over stochastic unary systems at low precisions.

Findings

01

Significant area-power efficiency improvements at low precisions.

02

Achieves 0.03 mm^2 and 9 mW for 4-bit computations.

03

Suitable for power-constrained edge AI devices.

Abstract

General matrix multiplication (GEMM) is a ubiquitous computing kernel/algorithm for data processing in diverse applications, including artificial intelligence (AI) and deep learning (DL). Recent shift towards edge computing has inspired GEMM architectures based on unary computing, which are predominantly stochastic and rate-coded systems. This paper proposes a novel GEMM architecture based on temporal-coding, called tuGEMM, that performs exact computation. We introduce two variants of tuGEMM, serial and parallel, with distinct area/power-latency trade-offs. Post-synthesis Power-Performance-Area (PPA) in 45 nm CMOS are reported for 2-bit, 4-bit, and 8-bit computations. The designs illustrate significant advantages in area-power efficiency over state-of-the-art stochastic unary systems especially at low precisions, e.g. incurring just 0.03 mm^2 and 9 mW for 4 bits, and 0.01 mm^2 and 4 mW…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.