Improving Bi-encoder Document Ranking Models with Two Rankers and   Multi-teacher Distillation

Jaekeol Choi; Euna Jung; Jangwon Suh; Wonjong Rhee

arXiv:2103.06523·cs.IR·August 9, 2021

Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation

Jaekeol Choi, Euna Jung, Jangwon Suh, Wonjong Rhee

PDF

1 Repo

TL;DR

This paper introduces TRMD, a novel distillation method that enhances bi-encoder neural ranking models by leveraging two teachers, including a cross-encoder, to improve relevance scoring efficiency and effectiveness.

Contribution

The paper proposes TRMD, a multi-teacher distillation approach that significantly improves bi-encoder ranking models by combining insights from both cross-encoder and bi-encoder teachers.

Findings

01

TRMD outperforms baseline bi-encoders in relevance ranking accuracy.

02

Maximum improvement of 11.4% in P@20 observed with TRMD.

03

TRMD also enhances cross-encoder models when used for distillation.

Abstract

BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maygodwithu/TRMD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Weight Decay · Multi-Head Attention · Dense Connections · Softmax · Layer Normalization · Attention Dropout