Multi-Label Learning with Provable Guarantee

Sayantan Dasgupta

arXiv:1609.03426·cs.LG·November 2, 2016

Multi-Label Learning with Provable Guarantee

Sayantan Dasgupta

PDF

Open Access

TL;DR

This paper introduces a scalable multi-label learning model that leverages higher order moments for prediction, providing provable guarantees and efficient training on large text datasets with thousands of labels.

Contribution

It proposes a novel moment-based model with convergence guarantees that significantly improves training speed and scalability for high-dimensional multi-label classification.

Findings

01

Achieves 10x-15x speed-up on large datasets

02

Can train on millions of documents with hundreds of thousands of labels

03

Provides theoretical guarantees on parameter estimation

Abstract

Here we study the problem of learning labels for large text corpora where each text can be assigned a variable number of labels. The problem might seem trivial when the label dimensionality is small and can be easily solved using a series of one-vs-all classifiers. However, as the label dimensionality increases to several thousand, the parameter space becomes extremely large, and it is no longer possible to use the one-vs-all technique. Here we propose a model based on the factorization of higher order moments of the words in the corpora, as well as the cross moment between the labels and the words for multi-label prediction. Our model provides guaranteed convergence bounds on the estimated parameters. Further, our model takes only three passes through the training dataset to extract the parameters, resulting in a highly scalable algorithm that can train on GB's of data consisting of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Handwritten Text Recognition Techniques