Unbiased Loss Functions for Multilabel Classification with Missing Labels
Erik Schultheis, Rohit Babbar

TL;DR
This paper develops unbiased loss functions for multilabel classification with missing labels, addressing the bias introduced by missing data and proposing solutions that balance bias and variance in training.
Contribution
It derives the unique unbiased estimators for multilabel reductions, including non-decomposable ones, and proposes convex upper-bounds to mitigate variance issues.
Findings
Unbiased estimators significantly change the bias-variance trade-off.
Switching to unbiased estimators may require stronger regularization.
Unbiased estimators can improve training in missing-label multilabel tasks.
Abstract
This paper considers binary and multilabel classification problems in a setting where labels are missing independently and with a known rate. Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks, such as matching Wikipedia articles to a small subset out of the hundreds of thousands of possible tags, where no human annotator can possibly check the validity of all the negative samples. For this reason, propensity-scored precision -- an unbiased estimate for precision-at-k under a known noise model -- has become one of the standard metrics in XMC. Few methods take this problem into account already during the training phase, and all are limited to loss functions that can be decomposed into a sum of contributions from each individual label. A typical approach to training is to reduce the multilabel problem into a series of binary or multiclass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Text and Document Classification Technologies
