PROD: Progressive Distillation for Dense Retrieval
Zhenghao Lin, Yeyun Gong, Xiao Liu, Hang Zhang, Chen Lin, Anlei Dong,, Jian Jiao, Jingwen Lu, Daxin Jiang, Rangan Majumder, Nan Duan

TL;DR
PROD introduces a progressive distillation approach for dense retrieval that gradually improves student models by bridging the gap with stronger teachers, achieving state-of-the-art results across multiple benchmarks.
Contribution
It proposes a novel progressive distillation method combining teacher and data progression to enhance dense retrieval models.
Findings
PROD achieves state-of-the-art performance on five benchmarks.
The method effectively bridges the gap between teacher and student models.
Extensive experiments validate the effectiveness of progressive distillation.
Abstract
Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Topic Modeling
