Extreme Multi-label Classification from Aggregated Labels
Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, Inderjit Dhillon

TL;DR
This paper introduces a scalable algorithm for extreme multi-label classification when labels are only available at the group level, enabling effective label imputation and improved performance on large-scale MIML tasks.
Contribution
It presents a novel scalable algorithm for imputing individual labels from group labels in XMC, extending MIML frameworks, and demonstrating superior results over existing methods.
Findings
The algorithm effectively imputes individual labels from group labels.
The approach scales to large XMC and MIML datasets.
Experimental results show improved accuracy over existing methods.
Abstract
Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Machine Learning and Algorithms
