Large-scale Multi-label Learning with Missing Labels

Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit S.; Dhillon

arXiv:1307.5101·cs.LG·November 26, 2013·369 cites

Large-scale Multi-label Learning with Missing Labels

Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, Inderjit S., Dhillon

PDF

Open Access

TL;DR

This paper introduces a scalable ERM framework for large-scale multi-label classification with missing labels, providing theoretical guarantees and outperforming existing methods on benchmark datasets.

Contribution

The paper proposes a unified ERM framework that handles millions of labels and missing labels, encompassing recent methods and offering efficient algorithms with theoretical risk bounds.

Findings

01

Outperforms existing label compression methods on benchmarks

02

Provides tight excess risk bounds with missing labels

03

Scales efficiently to datasets like Wikipedia

Abstract

The multi-label classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) the ability to tackle problems with a large number (say millions) of labels, and (b) the ability to handle data with missing labels. In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. Our framework, despite being simple, is surprisingly able to encompass several recent label-compression based methods which can be derived as special cases of our method. To optimize the ERM problem, we develop techniques that exploit the structure of specific loss functions - such as the squared loss function - to offer efficient algorithms. We further show that our learning framework admits formal excess risk bounds even in the presence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Machine Learning and Algorithms