Deploying a BERT-based Query-Title Relevance Classifier in a Production   System: a View from the Trenches

Leonard Dahlmann; Tomer Lancewicki

arXiv:2108.10197·cs.CL·August 24, 2021

Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches

Leonard Dahlmann, Tomer Lancewicki

PDF

Open Access

TL;DR

This paper presents a compact BERT-based model, BertBiLSTM, optimized for low-latency production environments, outperforming standard BERT in accuracy and inference speed for query-title relevance tasks.

Contribution

The authors develop a novel, efficient model called BertBiLSTM, trained via knowledge distillation and data augmentation, suitable for real-time industrial NLP applications.

Findings

01

BertBiLSTM infers in at most 0.2ms on CPU.

02

BertBiLSTM surpasses off-the-shelf BERT in accuracy.

03

The model outperforms other compact models in production settings.

Abstract

The Bidirectional Encoder Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks such as Text Classification and Named Entity Recognition (NER) applications. However, it is challenging to scale BERT for low-latency and high-throughput industrial use cases due to its enormous size. We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM). The model is capable of inferring an input in at most 0.2ms on CPU. BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task. We achieve this result in two phases. First, we create a pre-trained model, called eBERT, which is the original BERT architecture trained with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsAttention Is All You Need · Linear Layer · Knowledge Distillation · Residual Connection · WordPiece · Dropout · Dense Connections · Layer Normalization · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?