A no-regret generalization of hierarchical softmax to extreme multi-label classification
Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, R\'obert, Busa-Fekete, Krzysztof Dembczy\'nski

TL;DR
This paper introduces a no-regret probabilistic label tree approach for extreme multi-label classification, demonstrating superior performance and efficiency over traditional hierarchical softmax and other methods.
Contribution
It proves that probabilistic label trees are a no-regret generalization of hierarchical softmax for XMLC and shows that their implementation, extremeText, outperforms existing techniques.
Findings
ExtremeText achieves better accuracy than hierarchical softmax with pick-one-label heuristic.
ExtremeText is more efficient in model size and prediction time.
The pick-one-label heuristic is not consistent for multi-label classification.
Abstract
Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@k is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic - a reduction technique from multi-label to multi-class that is routinely used along with HSM - is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification
MethodsHierarchical Softmax · Softmax
