A Fresh Take on Stale Embeddings: Improving Dense Retriever Training   with Corrector Networks

Nicholas Monath; Will Grathwohl; Michael Boratko; Rob Fergus; Andrew; McCallum; Manzil Zaheer

arXiv:2409.01890·cs.LG·September 4, 2024

A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks

Nicholas Monath, Will Grathwohl, Michael Boratko, Rob Fergus, Andrew, McCallum, Manzil Zaheer

PDF

Open Access

TL;DR

This paper introduces a scalable corrector network to adjust stale cached embeddings in dense retrieval, significantly reducing re-embedding costs while maintaining state-of-the-art performance.

Contribution

It proposes a novel small parametric corrector network that improves training efficiency and accuracy in dense retrieval with stale embeddings.

Findings

01

Achieves state-of-the-art results without updating target embeddings during training.

02

Reduces re-embedding computational cost by 4-80x.

03

Theoretically analyzes the generalization of the corrector network.

Abstract

In dense retrieval, deep encoders provide embeddings for both inputs and targets, and the softmax function is used to parameterize a distribution over a large number of candidate targets (e.g., textual passages for information retrieval). Significant challenges arise in training such encoders in the increasingly prevalent scenario of (1) a large number of targets, (2) a computationally expensive target encoder model, (3) cached target embeddings that are out-of-date due to ongoing training of target encoder parameters. This paper presents a simple and highly scalable response to these challenges by training a small parametric corrector network that adjusts stale cached target embeddings, enabling an accurate softmax approximation and thereby sampling of up-to-date high scoring "hard negatives." We theoretically investigate the generalization properties of our proposed target corrector,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax