AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models
Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

TL;DR
AutoTinyBERT employs one-shot neural architecture search to automatically optimize hyper-parameters of tiny pre-trained language models, significantly improving efficiency and performance on NLP benchmarks.
Contribution
This paper introduces AutoTinyBERT, a novel method using one-shot NAS to automatically design hyper-parameters for tiny PLMs, surpassing existing SOTA models and enabling faster development.
Findings
Outperforms NAS-BERT and distillation-based models on GLUE and SQuAD.
Provides an adaptive, efficient approach for hyper-parameter search in tiny PLMs.
Enables faster development of optimized models for resource-constrained devices.
Abstract
Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g., the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT (Devlin et al., 2019). Few studies have been conducted to explore the design of architecture hyper-parameters in BERT, especially for the more efficient PLMs with tiny sizes, which are essential for practical deployment on resource-constrained devices. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters. Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints. We name our method AutoTinyBERT and evaluate its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications
MethodsAttention Is All You Need · Linear Layer · AutoTinyBERT · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · DistilBERT · Dense Connections · Softmax
