ASTRA: Accurate and Scalable ANNS-based Training of Extreme Classifiers
Sonu Mehta, Jayashree Mohan, Nagarajan Natarajan, Ramachandran Ramjee,, Manik Varma

TL;DR
ASTRA introduces a scalable and accurate training algorithm for extreme classifiers that leverages approximate nearest neighbor search with a novel negative sampling strategy, achieving state-of-the-art precision and significantly faster training on large datasets.
Contribution
The paper proposes ASTRA, a new method that improves training efficiency and accuracy for extreme classifiers by aligning negative sampling with the loss function and using a mixed sampling approach.
Findings
ASTRA achieves state-of-the-art precision on large-scale datasets.
Training time is reduced by 4x-15x compared to previous methods.
Effective negative sampling strategy improves classifier accuracy.
Abstract
`Extreme Classification'' (or XC) is the task of annotating data points (queries) with relevant labels (documents), from an extremely large set of possible labels, arising in search and recommendations. The most successful deep learning paradigm that has emerged over the last decade or so for XC is to embed the queries (and labels) using a deep encoder (e.g. DistilBERT), and use linear classifiers on top of the query embeddings. This architecture is of appeal because it enables millisecond-time inference using approximate nearest neighbor search (ANNS). The key question is how do we design training algorithms that are accurate as well as scale to labels on a limited number of GPUs. State-of-the-art XC techniques that demonstrate high accuracies (e.g., DEXML, Ren\'ee, DEXA) on standard datasets have per-epoch training time that scales as or employ expensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Neural Networks and Applications · Machine Learning and Data Classification
MethodsSparse Evolutionary Training
