On Missing Labels, Long-tails and Propensities in Extreme Multi-label   Classification

Erik Schultheis; Marek Wydmuch; Rohit Babbar; Krzysztof Dembczy\'nski

arXiv:2207.13186·cs.LG·July 28, 2022

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczy\'nski

PDF

TL;DR

This paper critically examines the propensity model for handling missing and long-tail labels in extreme multi-label classification, highlighting its flaws and proposing alternative strategies inspired by search engines and recommender systems.

Contribution

It provides a critical revision of the propensity model, discusses its limitations, and introduces promising alternative methods for XMLC.

Findings

01

Propensity model has theoretical soundness but practical flaws.

02

Alternative solutions from search and recommender systems are promising.

03

The paper encourages exploring new approaches beyond propensity models.

Abstract

The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.