Stochastic Negative Mining for Learning with Large Output Spaces
Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao, Chen, Sanjiv Kumar

TL;DR
This paper introduces Stochastic Negative Mining, a practical and theoretically grounded method for large output space label retrieval, improving over traditional negative sampling techniques.
Contribution
The paper develops a family of calibrated convex surrogate losses and proposes Stochastic Negative Mining for efficient large-scale label retrieval.
Findings
Stochastic Negative Mining outperforms standard negative sampling methods.
The surrogate losses are shown to be calibrated and convex under certain conditions.
Generalization error bounds are established for the proposed losses.
Abstract
We consider the problem of retrieving the most relevant labels for a given input when the size of the output space is very large. Retrieval methods are modeled as set-valued classifiers which output a small set of classes for each input, and a mistake is made if the label is not in the output set. Despite its practical importance, a statistically principled, yet practical solution to this problem is largely missing. To this end, we first define a family of surrogate losses and show that they are calibrated and convex under certain conditions on the loss parameters and data distribution, thereby establishing a statistical and analytical basis for using these losses. Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i.e. computation is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Grey System Theory Applications
