FastBERT: a Self-distilling BERT with Adaptive Inference Time

Weijie Liu; Peng Zhou; Zhe Zhao; Zhiruo Wang; Haotang Deng; Qi Ju

arXiv:2004.02178·cs.CL·April 30, 2020·57 cites

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju

PDF

Open Access 3 Repos

TL;DR

FastBERT is a self-distilling, speed-tunable version of BERT that adapts inference time to resource demands, significantly improving efficiency with minimal performance loss across multiple datasets.

Contribution

The paper introduces FastBERT, a novel self-distilling BERT variant with adaptive inference time, enabling flexible speed-performance tradeoffs in practical applications.

Findings

01

Achieves 1 to 12 times speedup over BERT.

02

Maintains high performance with minimal accuracy loss.

03

Effective across twelve diverse datasets.

Abstract

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece