Elbert: Fast Albert with Confidence-Window Based Early Exit
Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang

TL;DR
Elbert introduces a confidence-window based early exit mechanism for ALBERT, significantly accelerating inference speed by 2 to 10 times with minimal accuracy loss, suitable for resource-constrained NLP applications.
Contribution
This work presents ELBERT, an efficient early exit method that enhances ALBERT's inference speed without extra training or parameters, outperforming existing early exit techniques.
Findings
Achieves 2x to 10x inference speedup
Maintains comparable accuracy to ALBERT
Outperforms existing early exit methods
Abstract
Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed. Recently, compressing and accelerating BERT have become important topics. By incorporating a parameter-sharing strategy, ALBERT greatly reduces the number of parameters while achieving competitive performance. Nevertheless, ALBERT still suffers from a long inference time. In this work, we propose the ELBERT, which significantly improves the average inference speed compared to ALBERT due to the proposed confidence-window based early exit mechanism, without introducing additional parameters or extra training overhead. Experimental results show that ELBERT achieves an adaptive inference speedup varying from 2 to 10 with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Weight Decay · Dropout · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · LAMB · Linear Warmup With Linear Decay · Residual Connection
