Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining
Nicholas Monath, Manzil Zaheer, Kelsey Allen, Andrew McCallum

TL;DR
This paper presents a dynamic indexing method for dual encoder training that efficiently finds hard negatives, significantly reducing memory usage and error compared to static indexes and prior methods.
Contribution
Introduces a tree-based dynamic index and Nystrom approximation for efficient negative mining in dual encoder training, addressing model updates and large target sets.
Findings
Halves the error compared to brute-force negative mining.
Uses 150x less memory than previous state-of-the-art methods.
Effective on datasets with over twenty million targets.
Abstract
Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for training such dual encoders is an accurate estimation of gradients from the partition function of the softmax over the large output space; this requires finding negative targets that contribute most significantly ("hard negatives"). Since dual encoder model parameters change during training, the use of traditional static nearest neighbor indexes can be sub-optimal. These static indexes (1) periodically require expensive re-building of the index, which in turn requires (2) expensive re-encoding of all targets using updated model parameters. This paper addresses both of these challenges. First, we introduce an algorithm that uses a tree structure to approximate the softmax with provable bounds and that dynamically maintains the tree. Second, we approximate the effect of a gradient update on target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning
MethodsSoftmax
