JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration
Mingzi Wang, Yuan Meng, Chen Tang, Weixiang Zhang, Yijian Qin, Yang, Yao, Yingxin Li, Tongtong Feng, Xin Wang, Xun Guan, Zhi Wang, Wenwu Zhu

TL;DR
JAQ presents a joint framework for neural network architecture, quantization, and hardware design, addressing memory and search challenges to optimize performance and efficiency on edge devices.
Contribution
It introduces novel methods for memory-efficient low-bit quantization and fast hardware search, enabling automated co-design across multiple dimensions.
Findings
Achieves 7% higher Top-1 accuracy on ImageNet.
Reduces hardware search time to 0.15 seconds per iteration.
Effectively balances performance and efficiency for resource-constrained devices.
Abstract
The co-design of neural network architectures, quantization precisions, and hardware accelerators offers a promising approach to achieving an optimal balance between performance and efficiency, particularly for model deployment on resource-constrained edge devices. In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. However, effectively automating the design process across the vast search space of those three dimensions poses significant challenges, especially when pursuing extremely low-bit quantization. Specifical, the primary challenges include: (1) Memory overhead in software-side: Low-precision quantization-aware training can lead to significant memory usage due to storing large intermediate features and latent weights for back-propagation, potentially causing memory exhaustion. (2) Search time-consuming in hardware-side: The discrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · CCD and CMOS Imaging Sensors
