NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan, Liu

TL;DR
NAS-BERT introduces a neural architecture search-based method to efficiently compress BERT into multiple models of varying sizes and latencies, supporting diverse deployment scenarios while maintaining task-agnostic applicability.
Contribution
The paper presents NAS-BERT, a novel NAS-based approach that produces multiple compressed BERT models of different sizes and latencies, trained on pre-training tasks for broad downstream use.
Findings
NAS-BERT outperforms previous compression methods in accuracy.
The method generates models suitable for various device constraints.
Compressed models are effective across multiple NLP benchmarks.
Abstract
While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Dropout · Softmax
