Multi-Label Learning with Provable Guarantee
Sayantan Dasgupta

TL;DR
This paper introduces a scalable multi-label learning model that leverages higher order moments for prediction, providing provable guarantees and efficient training on large text datasets with thousands of labels.
Contribution
It proposes a novel moment-based model with convergence guarantees that significantly improves training speed and scalability for high-dimensional multi-label classification.
Findings
Achieves 10x-15x speed-up on large datasets
Can train on millions of documents with hundreds of thousands of labels
Provides theoretical guarantees on parameter estimation
Abstract
Here we study the problem of learning labels for large text corpora where each text can be assigned a variable number of labels. The problem might seem trivial when the label dimensionality is small and can be easily solved using a series of one-vs-all classifiers. However, as the label dimensionality increases to several thousand, the parameter space becomes extremely large, and it is no longer possible to use the one-vs-all technique. Here we propose a model based on the factorization of higher order moments of the words in the corpora, as well as the cross moment between the labels and the words for multi-label prediction. Our model provides guaranteed convergence bounds on the estimated parameters. Further, our model takes only three passes through the training dataset to extract the parameters, resulting in a highly scalable algorithm that can train on GB's of data consisting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
