AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient   Pre-trained Language Models

Yichun Yin; Cheng Chen; Lifeng Shang; Xin Jiang; Xiao Chen; Qun Liu

arXiv:2107.13686·cs.CL·July 30, 2021

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

PDF

Open Access 1 Repo

TL;DR

AutoTinyBERT employs one-shot neural architecture search to automatically optimize hyper-parameters of tiny pre-trained language models, significantly improving efficiency and performance on NLP benchmarks.

Contribution

This paper introduces AutoTinyBERT, a novel method using one-shot NAS to automatically design hyper-parameters for tiny PLMs, surpassing existing SOTA models and enabling faster development.

Findings

01

Outperforms NAS-BERT and distillation-based models on GLUE and SQuAD.

02

Provides an adaptive, efficient approach for hyper-parameter search in tiny PLMs.

03

Enables faster development of optimized models for resource-constrained devices.

Abstract

Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g., the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT (Devlin et al., 2019). Few studies have been conducted to explore the design of architecture hyper-parameters in BERT, especially for the more efficient PLMs with tiny sizes, which are essential for practical deployment on resource-constrained devices. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters. Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints. We name our method AutoTinyBERT and evaluate its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huawei-noah/Pretrained-Language-Model
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · AutoTinyBERT · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · DistilBERT · Dense Connections · Softmax