Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Xuanang Chen, Ben He, Kai Hui, Le Sun, Yingfei Sun

TL;DR
This paper introduces Simplified TinyBERT, a more efficient model for document retrieval that outperforms BERT-Base in speed and accuracy through knowledge distillation and model simplifications.
Contribution
The paper proposes two simplifications to TinyBERT and demonstrates their effectiveness in improving document ranking performance and efficiency.
Findings
Simplified TinyBERT outperforms BERT-Base in accuracy.
The model achieves 15× speedup over BERT-Base.
Knowledge distillation enhances document retrieval performance.
Abstract
Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge distillation models on the document ranking task. In addition, on top of the recently proposed TinyBERT model, two simplifications are proposed. Evaluations on two different and widely-used benchmarks demonstrate that Simplified TinyBERT with the proposed simplifications not only boosts TinyBERT, but also significantly outperforms BERT-Base when providing 15 speedup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
MethodsLinear Layer · Knowledge Distillation · Softmax · Layer Normalization · Weight Decay · Dropout · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · WordPiece
