AutoDistill: an End-to-End Framework to Explore and Distill   Hardware-Efficient Language Models

Xiaofan Zhang; Zongwei Zhou; Deming Chen; Yu Emma Wang

arXiv:2201.08539·cs.LG·January 24, 2022·5 cites

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang

PDF

Open Access

TL;DR

AutoDistill is an end-to-end framework that combines neural architecture search and multi-objective optimization to produce hardware-efficient, high-performing NLP models with reduced latency and size.

Contribution

It introduces a novel framework integrating architecture exploration and multi-objective optimization for effective model distillation tailored to hardware constraints.

Findings

01

AutoDistill finds models with up to 3.2% higher accuracy and 1.44x faster inference.

02

Distilled models outperform BERT_BASE and other compact models on GLUE and SQuAD.

03

The framework reduces model size significantly while maintaining or improving performance.

Abstract

Recently, large pre-trained models have significantly improved the performance of various Natural LanguageProcessing (NLP) tasks but they are expensive to serve due to long serving latency and large memory usage. To compress these models, knowledge distillation has attracted an increasing amount of interest as one of the most effective methods for model compression. However, existing distillation methods have not yet addressed the unique challenges of model serving in datacenters, such as handling fast evolving models, considering serving performance, and optimizing for multiple objectives. To solve these problems, we propose AutoDistill, an end-to-end model distillation framework integrating model architecture exploration and multi-objective optimization for building hardware-efficient NLP pre-trained models. We use Bayesian Optimization to conduct multi-objective Neural Architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Softmax · Multi-Head Attention · Knowledge Distillation · Residual Connection · MobileBERT