Distribution-Aligned Fine-Tuning for Efficient Neural Retrieval

Jurek Leonhardt; Marcel Jahnke; Avishek Anand

arXiv:2211.04942·cs.IR·November 10, 2022·1 cites

Distribution-Aligned Fine-Tuning for Efficient Neural Retrieval

Jurek Leonhardt, Marcel Jahnke, Avishek Anand

PDF

Open Access

TL;DR

This paper introduces DAFT, a two-stage fine-tuning method that aligns heterogeneous dual-encoder models to prevent collapsing representations, enabling efficient neural retrieval with lightweight query encoders.

Contribution

The paper proposes DAFT, a novel fine-tuning approach that aligns heterogeneous dual-encoders, improving their performance and efficiency in neural retrieval systems.

Findings

01

Heterogeneous dual-encoders are prone to collapsing representations during standard fine-tuning.

02

DAFT effectively prevents collapsing by aligning the encoders in a two-stage process.

03

Using DAFT, lightweight query encoders achieve competitive retrieval performance.

Abstract

Dual-encoder-based neural retrieval models achieve appreciable performance and complement traditional lexical retrievers well due to their semantic matching capabilities, which makes them a common choice for hybrid IR systems. However, these models exhibit a performance bottleneck in the online query encoding step, as the corresponding query encoders are usually large and complex Transformer models. In this paper we investigate heterogeneous dual-encoder models, where the two encoders are separate models that do not share parameters or initializations. We empirically show that heterogeneous dual-encoders are susceptible to collapsing representations, causing them to output constant trivial representations when they are fine-tuned using a standard contrastive loss due to a distribution mismatch. We propose DAFT, a simple two-stage fine-tuning approach that aligns the two encoders in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsAttention Is All You Need · Label Smoothing · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Multi-Head Attention · Adam · Absolute Position Encodings · Layer Normalization