VecQ: Minimal Loss DNN Model Compression With Vectorized Weight   Quantization

Cheng Gong; Yao Chen; Ye Lu; Tao Li; Cong Hao; Deming Chen

arXiv:2005.08501·cs.CV·June 11, 2020

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

PDF

1 Repo

TL;DR

VecQ introduces a novel vectorized weight quantization method for DNNs that minimizes quantization loss, improving accuracy and efficiency across various datasets and tasks.

Contribution

The paper proposes VecQ, a new quantization approach with a vector loss metric, ensuring minimal quantization loss and enhanced model accuracy, along with accelerated training.

Findings

01

Outperforms state-of-the-art quantization methods in accuracy

02

Achieves up to 16× weight size reduction with maintained performance

03

Effective across diverse datasets and tasks

Abstract

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult to be optimized directly. Minimizing direct quantization loss (DQL) of the coefficient data is an effective local optimization method, but previous works often neglect the accurate control of the DQL, resulting in a higher loss of the final DNN model accuracy. In this paper, we propose a novel metric called Vector Loss. Based on this new metric, we develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy. In addition, in order to speed up the proposed quantization process during model training, we accelerate the quantization process with a parameterized probability estimation method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GongCheng1919/VecQ
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings