LEAD: Liberal Feature-based Distillation for Dense Retrieval

Hao Sun; Xiao Liu; Yeyun Gong; Anlei Dong; Jingwen Lu; Yan Zhang,; Linjun Yang; Rangan Majumder; Nan Duan

arXiv:2212.05225·cs.IR·December 12, 2023

LEAD: Liberal Feature-based Distillation for Dense Retrieval

Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang,, Linjun Yang, Rangan Majumder, Nan Duan

PDF

Open Access 1 Repo

TL;DR

LEAD introduces a flexible feature-based distillation method that aligns intermediate layer distributions between teacher and student models, improving dense retrieval performance without constraints on vocabularies or architectures.

Contribution

The paper proposes LEAD, a novel, extendable, and architecture-agnostic feature-based distillation approach for dense retrieval models.

Findings

01

LEAD outperforms baseline methods on MS MARCO and TREC benchmarks.

02

It is effective across different model architectures and datasets.

03

LEAD is portable and does not require specific vocabularies or tokenizers.

Abstract

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional methods include response-based methods and feature-based methods. Response-based methods are widely used but suffer from lower upper limits of performance due to their ignorance of intermediate signals, while feature-based methods have constraints on vocabularies, tokenizers and model architectures. In this paper, we propose a liberal feature-based distillation method (LEAD). LEAD aligns the distribution between the intermediate layers of teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizers, or model architectures. Extensive experiments show the effectiveness of LEAD on widely-used benchmarks, including MS MARCO Passage Ranking, TREC 2019 DL Track, MS MARCO Document Ranking and TREC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/simxns
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation