Distribution-Aligned Fine-Tuning for Efficient Neural Retrieval
Jurek Leonhardt, Marcel Jahnke, Avishek Anand

TL;DR
This paper introduces DAFT, a two-stage fine-tuning method that aligns heterogeneous dual-encoder models to prevent collapsing representations, enabling efficient neural retrieval with lightweight query encoders.
Contribution
The paper proposes DAFT, a novel fine-tuning approach that aligns heterogeneous dual-encoders, improving their performance and efficiency in neural retrieval systems.
Findings
Heterogeneous dual-encoders are prone to collapsing representations during standard fine-tuning.
DAFT effectively prevents collapsing by aligning the encoders in a two-stage process.
Using DAFT, lightweight query encoders achieve competitive retrieval performance.
Abstract
Dual-encoder-based neural retrieval models achieve appreciable performance and complement traditional lexical retrievers well due to their semantic matching capabilities, which makes them a common choice for hybrid IR systems. However, these models exhibit a performance bottleneck in the online query encoding step, as the corresponding query encoders are usually large and complex Transformer models. In this paper we investigate heterogeneous dual-encoder models, where the two encoders are separate models that do not share parameters or initializations. We empirically show that heterogeneous dual-encoders are susceptible to collapsing representations, causing them to output constant trivial representations when they are fine-tuned using a standard contrastive loss due to a distribution mismatch. We propose DAFT, a simple two-stage fine-tuning approach that aligns the two encoders in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsAttention Is All You Need · Label Smoothing · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Multi-Head Attention · Adam · Absolute Position Encodings · Layer Normalization
