HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility

Jonathan Herbst (1); Michael Pellauer (2); and Sherief Reda (1) ((1) Brown University; (2) NVIDIA)

arXiv:2512.12847·cs.AR·December 16, 2025

HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility

Jonathan Herbst (1), Michael Pellauer (2), and Sherief Reda (1) ((1) Brown University, (2) NVIDIA)

PDF

Open Access

TL;DR

HaShiFlex is a high-throughput, energy-efficient DNN accelerator that embeds most layers in hardware, uses power-of-two quantization for simplified computation, and maintains fine-tuning flexibility, significantly outperforming GPUs in inference speed.

Contribution

The paper presents a novel hardware architecture that combines embedded layer processing, power-of-two quantization, and a reprogrammable final layer for high throughput and fine-tuning capability.

Findings

01

Achieves 20x inference throughput increase over GPUs with fine-tuning.

02

Processes 1.21 million images/sec with full fine-tuning.

03

Reaches 4 million images/sec without post-deployment fine-tuning.

Abstract

We introduce a high-throughput neural network accelerator that embeds most network layers directly in hardware, minimizing data transfer and memory usage while preserving a degree of flexibility via a small neural processing unit for the final classification layer. By leveraging power-of-two (Po2) quantization for weights, we replace multiplications with simple rewiring, effectively reducing each convolution to a series of additions. This streamlined approach offers high-throughput, energy-efficient processing, making it highly suitable for applications where model parameters remain stable, such as continuous sensing tasks at the edge or large-scale data center deployments. Furthermore, by including a strategically chosen reprogrammable final layer, our design achieves high throughput without sacrificing fine-tuning capabilities. We implement this accelerator in a 7nm ASIC flow using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Domain Adaptation and Few-Shot Learning