Learning label-label correlations in Extreme Multi-label Classification via Label Features
Siddhant Kharbanda, Devaansh Gupta, Erik Schultheis, Atmadeep, Banerjee, Cho-Jui Hsieh, Rohit Babbar

TL;DR
This paper introduces Gandalf, a novel method for extreme multi-label classification that leverages label features and label co-occurrence graphs to generate additional training data, improving tail label prediction without extra computational costs.
Contribution
Gandalf is a new approach that uses label graphs and features to create supplementary training instances, enhancing existing XMC algorithms' performance on large-scale datasets.
Findings
Models trained on augmented data outperform original datasets on PSP@k.
Gandalf achieves an average 5% improvement across multiple algorithms and datasets.
The method is plug-and-play and does not increase computational overhead.
Abstract
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches. In this paper, we propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution. By exploiting the characteristics of the short-text XMC problem, it leverages the label features to construct valid training instances, and uses the label graph for generating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsGated Adaptive Network for Deep Automated Learning of Features
