DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He, Qi Zhang, Weiping Ding, Duoqian Miao, Jun Zhao, Liang Hu, Longbing Cao

TL;DR
DE$^3$-BERT introduces a novel early exiting method that combines local entropy and global distance information using prototypical networks, improving inference speed and accuracy trade-offs in BERT models.
Contribution
This paper proposes DE$^3$-BERT, a new framework that leverages class prototypes and distance metrics to enhance early exiting decisions in BERT, addressing limitations of local-only approaches.
Findings
Outperforms state-of-the-art models on GLUE benchmark
Achieves better speed-accuracy trade-offs with minimal overhead
Validates effectiveness across different speed-up ratios
Abstract
Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This leads to suboptimal estimation of prediction correctness, resulting in erroneous exiting decisions. To bridge the gap, we explore the necessity of effectively combining both local and global information to ensure reliable early exiting during inference. Purposefully, we leverage prototypical networks to learn class prototypes and devise a distance metric between samples and class prototypes. This enables us to utilize global information for estimating the correctness of early predictions. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Dropout · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Adam · Dense Connections
