On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification
Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczy\'nski

TL;DR
This paper critically examines the propensity model for handling missing and long-tail labels in extreme multi-label classification, highlighting its flaws and proposing alternative strategies inspired by search engines and recommender systems.
Contribution
It provides a critical revision of the propensity model, discusses its limitations, and introduces promising alternative methods for XMLC.
Findings
Propensity model has theoretical soundness but practical flaws.
Alternative solutions from search and recommender systems are promising.
The paper encourages exploring new approaches beyond propensity models.
Abstract
The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
