Once Quantization-Aware Training: High Performance Extremely Low-bit   Architecture Search

Mingzhu Shen; Feng Liang; Ruihao Gong; Yuhang Li; Chuming Li; Chen; Lin; Fengwei Yu; Junjie Yan; Wanli Ouyang

arXiv:2010.04354·cs.CV·September 29, 2021

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

Mingzhu Shen, Feng Liang, Ruihao Gong, Yuhang Li, Chuming Li, Chen, Lin, Fengwei Yu, Junjie Yan, Wanli Ouyang

PDF

Open Access 1 Repo

TL;DR

This paper introduces OQAT, a novel framework combining architecture search and quantization with a shared step size and bit-inheritance, achieving state-of-the-art low-bit neural network performance with reduced training time.

Contribution

The paper proposes a new framework, OQAT, that effectively combines neural architecture search with quantization, including a bit-inheritance scheme, to improve low-bit neural network accuracy and efficiency.

Findings

01

OQATNets achieve state-of-the-art accuracy under various bit-widths.

02

OQAT-2bit-M surpasses MobileNetV3 by 9% in accuracy with less computation.

03

The framework reduces training time and enhances quantization accuracy for low-bit networks.

Abstract

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LaVieEnRoseSMZ/OQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · ReLU6 · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Average Pooling · Hard Swish · Sigmoid Activation · Dropout