Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins

Lukas Gienapp; Niklas Deckers; Martin Potthast; Harrisen Scells

arXiv:2407.21515·cs.IR·June 24, 2025

Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins

Lukas Gienapp, Niklas Deckers, Martin Potthast, Harrisen Scells

PDF

TL;DR

This paper introduces a novel self-supervised loss function for training bi-encoder retrieval models, eliminating the need for teacher models and batch sampling, while achieving comparable effectiveness with significantly reduced data and training time.

Contribution

The authors propose a parameter-free self-distillation loss that leverages pre-trained language models for implicit hard negative mining, simplifying training and improving efficiency.

Findings

01

Self-distillation matches teacher distillation effectiveness with less data.

02

Training speed improves by 3x to 15x over traditional methods.

03

The approach requires only 13.5% of the data used in previous methods.

Abstract

Representation-based retrieval models, so-called bi-encoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art bi-encoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We investigate the capabilities of our proposed approach through extensive experiments, demonstrating that self-distillation can match the effectiveness of teacher distillation using only 13.5% of the data, while offering a speedup in training time between 3x and 15x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation