Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search
Mingzhu Shen, Feng Liang, Ruihao Gong, Yuhang Li, Chuming Li, Chen, Lin, Fengwei Yu, Junjie Yan, Wanli Ouyang

TL;DR
This paper introduces OQAT, a novel framework combining architecture search and quantization with a shared step size and bit-inheritance, achieving state-of-the-art low-bit neural network performance with reduced training time.
Contribution
The paper proposes a new framework, OQAT, that effectively combines neural architecture search with quantization, including a bit-inheritance scheme, to improve low-bit neural network accuracy and efficiency.
Findings
OQATNets achieve state-of-the-art accuracy under various bit-widths.
OQAT-2bit-M surpasses MobileNetV3 by 9% in accuracy with less computation.
The framework reduces training time and enhances quantization accuracy for low-bit networks.
Abstract
Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · ReLU6 · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Average Pooling · Hard Swish · Sigmoid Activation · Dropout
