Fast Label Embeddings for Extremely Large Output Spaces
Paul Mineiro, Nikos Karampatziakis

TL;DR
This paper introduces a fast, randomized label embedding algorithm for large output spaces in multiclass and multilabel problems, significantly improving efficiency and achieving state-of-the-art results on large datasets.
Contribution
It presents a novel, fast label embedding method based on a connection between rank constrained estimation and low-dimensional embeddings, applicable to both multiclass and multilabel tasks.
Findings
Algorithm is exponentially faster than naive methods.
Achieves state-of-the-art results on large-scale datasets.
Effective in both multiclass and multilabel settings.
Abstract
Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency. In this work we utilize a correspondence between rank constrained estimation and low dimensional label embeddings that uncovers a fast label embedding algorithm which works in both the multiclass and multilabel settings. The result is a randomized algorithm for partial least squares, whose running time is exponentially faster than naive algorithms. We demonstrate our techniques on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project, where we obtain state of the art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Face and Expression Recognition · Machine Learning and Data Classification
