Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

Jannis Sch\"onleber; Lukas Cavigelli; Matteo Perotti; Luca Benini; Renzo Andri

arXiv:2311.10207·cs.AR·July 28, 2025·1 cites

Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication

Jannis Sch\"onleber, Lukas Cavigelli, Matteo Perotti, Luca Benini, Renzo Andri

PDF

Open Access 1 Repo

TL;DR

Stella Nera is a novel hardware accelerator that leverages Maddness, a hash-based approximation of matrix multiplication, achieving high energy efficiency and enabling gradient-based optimization for AI models.

Contribution

It introduces Stella Nera, the first Maddness-based accelerator with differentiable approximation, significantly improving energy efficiency and supporting end-to-end training.

Findings

01

Energy efficiency of 161 TOp/s/[email protected], 25x better than traditional accelerators.

02

Achieves 92.5% Top-1 accuracy on CIFAR-10 with end-to-end training.

03

First to integrate Maddness with differentiable approximation in hardware.

Abstract

Artificial intelligence has surged in recent years, with advancements in machine learning rapidly impacting nearly every area of life. However, the growing complexity of these models has far outpaced advancements in available hardware accelerators, leading to significant computational and energy demands, primarily due to matrix multiplications, which dominate the compute workload. Maddness (i.e., Multiply-ADDitioN-lESS) presents a hash-based version of product quantization, which renders matrix multiplications into lookups and additions, eliminating the need for multipliers entirely. We present Stella Nera, the first Maddness-based accelerator achieving an energy efficiency of 161 TOp/s/[email protected], 25x better than conventional MatMul accelerators due to its small components and reduced computational complexity. We further enhance Maddness with a differentiable approximation, allowing for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joennlae/halutmatmul
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices