Improving Efficient Neural Ranking Models with Cross-Architecture   Knowledge Distillation

Sebastian Hofst\"atter; Sophia Althammer; Michael Schr\"oder; Mete; Sertkan; Allan Hanbury

arXiv:2010.02666·cs.IR·January 25, 2021·64 cites

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

Sebastian Hofst\"atter, Sophia Althammer, Michael Schr\"oder, Mete, Sertkan, Allan Hanbury

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces a cross-architecture knowledge distillation method with a margin-focused loss to enhance the effectiveness of efficient neural ranking models, bridging the gap with larger models without sacrificing efficiency.

Contribution

It proposes a novel Margin-MSE loss for distillation that accounts for score distribution differences across architectures, improving neural ranking performance.

Findings

01

Significant effectiveness gains across multiple architectures.

02

Improved retrieval performance with no efficiency loss.

03

Enhanced nearest neighbor retrieval with distillation.

Abstract

Retrieval and ranking models are the backbone of many applications such as web search, open domain QA, or text-based recommender systems. The latency of neural ranking models at query time is largely dependent on the architecture and deliberate choices by their designers to trade-off effectiveness for higher efficiency. This focus on low query latency of a rising number of efficient ranking architectures make them feasible for production deployment. In machine learning an increasingly common approach to close the effectiveness gap of more efficient models is to apply knowledge distillation from a large teacher model to a smaller student model. We find that different ranking architectures tend to produce output scores in different magnitudes. Based on this finding, we propose a cross-architecture training procedure with a margin focused loss (Margin-MSE), that adapts knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sebastian-hofstaetter/neural-ranking-kd
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Topic Modeling

MethodsLinear Layer · Knowledge Distillation · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay