Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Xuanang Chen; Ben He; Kai Hui; Le Sun; Yingfei Sun

arXiv:2009.07531·cs.IR·May 5, 2023·6 cites

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Xuanang Chen, Ben He, Kai Hui, Le Sun, Yingfei Sun

PDF

Open Access 4 Repos

TL;DR

This paper introduces Simplified TinyBERT, a more efficient model for document retrieval that outperforms BERT-Base in speed and accuracy through knowledge distillation and model simplifications.

Contribution

The paper proposes two simplifications to TinyBERT and demonstrates their effectiveness in improving document ranking performance and efficiency.

Findings

01

Simplified TinyBERT outperforms BERT-Base in accuracy.

02

The model achieves 15× speedup over BERT-Base.

03

Knowledge distillation enhances document retrieval performance.

Abstract

Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge distillation models on the document ranking task. In addition, on top of the recently proposed TinyBERT model, two simplifications are proposed. Evaluations on two different and widely-used benchmarks demonstrate that Simplified TinyBERT with the proposed simplifications not only boosts TinyBERT, but also significantly outperforms BERT-Base when providing 15 $\times$ speedup.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques

MethodsLinear Layer · Knowledge Distillation · Softmax · Layer Normalization · Weight Decay · Dropout · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · WordPiece