Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Jan Klhufek; Miroslav Safar; Vojtech Mrazek; Zdenek Vasicek; Lukas Sekanina

arXiv:2404.05368·cs.AR·July 23, 2025·1 cites

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Jan Klhufek, Miroslav Safar, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that combining mixed quantization schemes with optimized mapping strategies in hardware-aware CNN accelerators can significantly improve energy efficiency and memory usage without sacrificing accuracy.

Contribution

It introduces a novel extension of the Timeloop tool for mixed quantization support and proposes an optimization algorithm for layer-specific quantization and mapping.

Findings

01

Energy savings up to 37% on MobileNet models

02

Supports mixed quantization in mapping tool

03

Improves accuracy-energy-memory trade-offs

Abstract

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e., placement and scheduling of DNN elementary operations on hardware units of the accelerator). We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings that utilize the hardware resources more effectively. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations. To find, analyze, and exploit these mappings, we: (i) extend a general-purpose state-of-the-art mapping tool (Timeloop) to support mixed quantization, which is not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ehw-fit/timeloop-with-quantization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications