A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks
Nicholas Monath, Will Grathwohl, Michael Boratko, Rob Fergus, Andrew, McCallum, Manzil Zaheer

TL;DR
This paper introduces a scalable corrector network to adjust stale cached embeddings in dense retrieval, significantly reducing re-embedding costs while maintaining state-of-the-art performance.
Contribution
It proposes a novel small parametric corrector network that improves training efficiency and accuracy in dense retrieval with stale embeddings.
Findings
Achieves state-of-the-art results without updating target embeddings during training.
Reduces re-embedding computational cost by 4-80x.
Theoretically analyzes the generalization of the corrector network.
Abstract
In dense retrieval, deep encoders provide embeddings for both inputs and targets, and the softmax function is used to parameterize a distribution over a large number of candidate targets (e.g., textual passages for information retrieval). Significant challenges arise in training such encoders in the increasingly prevalent scenario of (1) a large number of targets, (2) a computationally expensive target encoder model, (3) cached target embeddings that are out-of-date due to ongoing training of target encoder parameters. This paper presents a simple and highly scalable response to these challenges by training a small parametric corrector network that adjusts stale cached target embeddings, enabling an accurate softmax approximation and thereby sampling of up-to-date high scoring "hard negatives." We theoretically investigate the generalization properties of our proposed target corrector,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax
