Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches
Leonard Dahlmann, Tomer Lancewicki

TL;DR
This paper presents a compact BERT-based model, BertBiLSTM, optimized for low-latency production environments, outperforming standard BERT in accuracy and inference speed for query-title relevance tasks.
Contribution
The authors develop a novel, efficient model called BertBiLSTM, trained via knowledge distillation and data augmentation, suitable for real-time industrial NLP applications.
Findings
BertBiLSTM infers in at most 0.2ms on CPU.
BertBiLSTM surpasses off-the-shelf BERT in accuracy.
The model outperforms other compact models in production settings.
Abstract
The Bidirectional Encoder Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks such as Text Classification and Named Entity Recognition (NER) applications. However, it is challenging to scale BERT for low-latency and high-throughput industrial use cases due to its enormous size. We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM). The model is capable of inferring an input in at most 0.2ms on CPU. BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task. We achieve this result in two phases. First, we create a pre-trained model, called eBERT, which is the original BERT architecture trained with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsAttention Is All You Need · Linear Layer · Knowledge Distillation · Residual Connection · WordPiece · Dropout · Dense Connections · Layer Normalization · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?
