Fast Label Embeddings via Randomized Linear Algebra
Paul Mineiro, Nikos Karampatziakis

TL;DR
This paper introduces a fast randomized algorithm for label embeddings that significantly improves computational efficiency in large-scale multiclass and multilabel problems, achieving state-of-the-art results on public datasets.
Contribution
It presents a novel randomized algorithm for label embeddings based on a connection between rank constrained estimation and low-dimensional embeddings, applicable to both multiclass and multilabel tasks.
Findings
Algorithm is exponentially faster than naive methods.
Achieves state-of-the-art results on large-scale datasets.
Effective in both multiclass and multilabel settings.
Abstract
Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency. In this work we utilize a correspondence between rank constrained estimation and low dimensional label embeddings that uncovers a fast label embedding algorithm which works in both the multiclass and multilabel settings. The result is a randomized algorithm whose running time is exponentially faster than naive algorithms. We demonstrate our techniques on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project, where we obtain state of the art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Machine Learning and Algorithms
