Automatic Mixed-Precision Quantization Search of BERT
Changsheng Zhao, Ting Hua, Yilin Shen, Qian Lou, Hongxia, Jin

TL;DR
This paper introduces an automatic mixed-precision quantization and pruning framework for BERT that optimizes model size and performance, leveraging neural architecture search for fine-grained parameter management.
Contribution
It presents a novel differentiable neural architecture search-based method for automatic mixed-precision quantization and subgroup-wise pruning of BERT models.
Findings
Outperforms baseline models in downstream tasks.
Achieves significant model size reduction with maintained accuracy.
Enables extremely lightweight models when combined with other methods.
Abstract
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few quantization attempts that are specifically designed for natural language processing tasks. They suffer from a small compression ratio or a large error rate since manual setting on hyper-parameters is required and fine-grained subgroup-wise quantization is not supported. In this paper, we proposed an automatic mixed-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Layer Normalization · Residual Connection · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Adam
