CEEBERT: Cross-Domain Inference in Early Exit BERT
Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR
CeeBERT is an online adaptive method that enables early inference in BERT models, significantly reducing latency by dynamically selecting exit points based on confidence levels without labeled data.
Contribution
It introduces CeeBERT, an online learning algorithm for cross-domain early exit inference in BERT, optimizing speed-accuracy trade-offs without requiring labeled data.
Findings
Speeds up BERT/ALBERT models by 2-3.5x with minimal accuracy loss.
Effectively adapts to cross-domain data distributions.
Reduces unnecessary computation during inference.
Abstract
Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear Materials and Properties · Age of Information Optimization · Healthcare Operations and Scheduling Optimization
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · LAMB · Adam · Residual Connection · Multi-Head Attention
