BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei

TL;DR
This paper introduces a patience-based early exit method for pretrained language models that enhances inference speed and robustness by dynamically stopping inference when intermediate predictions stabilize, improving efficiency and accuracy.
Contribution
It presents a novel plug-and-play early exit technique that couples internal classifiers with each layer of a PLM, improving speed and robustness over existing methods.
Findings
Improves inference efficiency by reducing the number of layers used.
Enhances model robustness and accuracy by preventing overthinking.
Achieves better accuracy-speed trade-off compared to existing early exit methods.
Abstract
In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM). To achieve this, our approach couples an internal-classifier with each layer of a PLM and dynamically stops inference when the intermediate predictions of the internal classifiers remain unchanged for a pre-defined number of steps. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers. Meanwhile, experimental results with an ALBERT model show that our method can improve the accuracy and robustness of the model by preventing it from overthinking and exploiting multiple classifiers for prediction, yielding a better accuracy-speed trade-off compared to existing early exit methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsLinear Layer · Early exiting using confidence measures · Softmax · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Attention Is All You Need · WordPiece · Residual Connection
