TL;DR
This paper introduces TRMD, a novel distillation method that enhances bi-encoder neural ranking models by leveraging two teachers, including a cross-encoder, to improve relevance scoring efficiency and effectiveness.
Contribution
The paper proposes TRMD, a multi-teacher distillation approach that significantly improves bi-encoder ranking models by combining insights from both cross-encoder and bi-encoder teachers.
Findings
TRMD outperforms baseline bi-encoders in relevance ranking accuracy.
Maximum improvement of 11.4% in P@20 observed with TRMD.
TRMD also enhances cross-encoder models when used for distillation.
Abstract
BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Weight Decay · Multi-Head Attention · Dense Connections · Softmax · Layer Normalization · Attention Dropout
