Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators

Prabhu Vellaisamy; Harideep Nair; Di Wu; Shawn Blanton; and John Paul Shen

arXiv:2602.00838·cs.AR·February 3, 2026

Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators

Prabhu Vellaisamy, Harideep Nair, Di Wu, Shawn Blanton, and John Paul Shen

PDF

Open Access

TL;DR

This paper evaluates unary GEMM designs for low precision deep learning inference, comparing them to binary GEMM, and explores their energy efficiency and optimal use cases in edge AI accelerators.

Contribution

It provides a comprehensive evaluation of three recent unary GEMM proposals, analyzing their tradeoffs and potential for energy-efficient low precision DL inference.

Findings

01

Unary GEMM designs show promising energy efficiency for low precision DL.

02

Optimal bit-widths and matrix sizes vary across designs, indicating specific use-case advantages.

03

Unary GEMM can be effectively integrated into future edge AI accelerators.

Abstract

General matrix multiplication (GEMM) is a fundamental operation in deep learning (DL). With DL moving increasingly toward low precision, recent works have proposed novel unary GEMM designs as an alternative to conventional binary GEMM hardware. A rigorous evaluation of recent unary and binary GEMM designs is needed to assess the potential of unary hardware for future DL compute. This paper focuses on unary GEMM designs for integer-based DL inference and performs a detailed evaluation of three latest unary design proposals, namely, uGEMM, tuGEMM and tubGEMM, by comparing them to a conventional binary GEMM. Rigorous post-synthesis evaluations beyond prior works are performed across varying bit-widths and matrix sizes to assess the designs' tradeoffs and determine optimal sweetspots. Further, we perform weight sparsity analysis across eight pretrained convolutional neural networks (CNNs)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLow-power high-performance VLSI design · Stochastic Gradient Optimization Techniques · Numerical Methods and Algorithms