PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal, Anamitra R. Choudhury, Saurabh M. Raje, Venkatesan T., Chakaravarthy, Yogish Sabharwal, Ashish Verma

TL;DR
PoWER-BERT is a method that accelerates BERT inference by eliminating redundant word-vectors based on self-attention significance, achieving up to 4.5x faster inference with minimal accuracy loss.
Contribution
It introduces a novel strategy for reducing BERT inference time by selectively removing redundant word-vectors using a learned significance measure.
Findings
Up to 4.5x reduction in BERT inference time with <1% accuracy loss
Achieves up to 6.8x speedup on ALBERT with minimal accuracy impact
Outperforms prior methods in accuracy-time trade-off
Abstract
We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · LAMB · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
