From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning
Ranran Haoran Zhang, Bensu U\c{c}ar, Soumik Dey, Hansi Wu, Binbin Li,, Rui Zhang

TL;DR
This paper introduces PUSL, a novel approach for open-vocabulary extreme multi-label classification that addresses missing labels and evaluation issues by reframing the task as an infinite keyphrase generation problem, leading to improved label generation and more reliable assessment.
Contribution
The paper proposes Positive-Unlabeled Sequence Learning (PUSL), a new method that tackles label laziness and evaluation unreliability in OXMC by reformulating it as a keyphrase generation task and introducing new metrics.
Findings
PUSL generates 30% more unique labels in imbalanced datasets.
72% of PUSL's predictions match actual user queries.
PUSL outperforms existing methods in F1 scores as label counts increase.
Abstract
Open-vocabulary Extreme Multi-label Classification (OXMC) extends traditional XMC by allowing prediction beyond an extremely large, predefined label set (typically to labels), addressing the dynamic nature of real-world labeling tasks. However, self-selection bias in data annotation leads to significant missing labels in both training and test data, particularly for less popular inputs. This creates two critical challenges: generation models learn to be "lazy'" by under-generating labels, and evaluation becomes unreliable due to insufficient annotation in the test set. In this work, we introduce Positive-Unlabeled Sequence Learning (PUSL), which reframes OXMC as an infinite keyphrase generation task, addressing the generation model's laziness. Additionally, we propose to adopt a suite of evaluation metrics, F1@ and newly proposed B@, to reliably assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training · ALIGN
