TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for   Efficient Retrieval

Wenhao Lu; Jian Jiao; Ruofei Zhang

arXiv:2002.06275·cs.IR·February 18, 2020·28 cites

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Wenhao Lu, Jian Jiao, Ruofei Zhang

PDF

Open Access 2 Repos

TL;DR

TwinBERT introduces a twin-structured BERT model that decouples query and document encoding, enabling pre-computation of document embeddings, significantly reducing inference time while maintaining high retrieval performance.

Contribution

The paper proposes a novel twin-structured BERT model with independent encoders for query and document, enabling offline document embedding and improved retrieval efficiency.

Findings

01

Inference time reduced to around 20ms on CPUs.

02

Performance comparable to BERT-Base in retrieval tasks.

03

Significant improvements in relevance metrics with minimal latency impact.

Abstract

Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We present TwinBERT model for effective and efficient retrieval, which has twin-structured BERT-like encoders to represent query and document respectively and a crossing layer to combine the embeddings and produce a similarity score. Different from BERT, where the two input sentences are concatenated and encoded together, TwinBERT decouples them during encoding and produces the embeddings for query and document independently, which allows document embeddings to be pre-computed offline and cached in memory. Thereupon, the computation left for run-time is from the query encoding and query-document crossing only. This single change can save large amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax