Improving Dual-Encoder Training through Dynamic Indexes for Negative   Mining

Nicholas Monath; Manzil Zaheer; Kelsey Allen; Andrew McCallum

arXiv:2303.15311·cs.LG·March 28, 2023·1 cites

Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining

Nicholas Monath, Manzil Zaheer, Kelsey Allen, Andrew McCallum

PDF

Open Access

TL;DR

This paper presents a dynamic indexing method for dual encoder training that efficiently finds hard negatives, significantly reducing memory usage and error compared to static indexes and prior methods.

Contribution

Introduces a tree-based dynamic index and Nystrom approximation for efficient negative mining in dual encoder training, addressing model updates and large target sets.

Findings

01

Halves the error compared to brute-force negative mining.

02

Uses 150x less memory than previous state-of-the-art methods.

03

Effective on datasets with over twenty million targets.

Abstract

Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for training such dual encoders is an accurate estimation of gradients from the partition function of the softmax over the large output space; this requires finding negative targets that contribute most significantly ("hard negatives"). Since dual encoder model parameters change during training, the use of traditional static nearest neighbor indexes can be sub-optimal. These static indexes (1) periodically require expensive re-building of the index, which in turn requires (2) expensive re-encoding of all targets using updated model parameters. This paper addresses both of these challenges. First, we introduce an algorithm that uses a tree structure to approximate the softmax with provable bounds and that dynamically maintains the tree. Second, we approximate the effect of a gradient update on target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning

MethodsSoftmax