NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural   Architecture Search

Jin Xu; Xu Tan; Renqian Luo; Kaitao Song; Jian Li; Tao Qin; Tie-Yan; Liu

arXiv:2105.14444·cs.CL·June 1, 2021

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan, Liu

PDF

TL;DR

NAS-BERT introduces a neural architecture search-based method to efficiently compress BERT into multiple models of varying sizes and latencies, supporting diverse deployment scenarios while maintaining task-agnostic applicability.

Contribution

The paper presents NAS-BERT, a novel NAS-based approach that produces multiple compressed BERT models of different sizes and latencies, trained on pre-training tasks for broad downstream use.

Findings

01

NAS-BERT outperforms previous compression methods in accuracy.

02

The method generates models suitable for various device constraints.

03

Compressed models are effective across multiple NLP benchmarks.

Abstract

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Dropout · Softmax