Dealing with a large number of classes -- Likelihood, Discrimination or Ranking?
David Barber, Aleksandar Botev

TL;DR
This paper explores methods for training probabilistic classifiers with many classes, comparing likelihood approximation and ranking approaches, and provides insights into their effectiveness and optimal threshold settings.
Contribution
It introduces a simple likelihood approximation method for large-class classification and relates it to ranking objectives, offering practical threshold setting guidance.
Findings
Likelihood approximation performs well on toy problems
The approach is competitive with other non-likelihood methods
Optimal threshold setting improves ranking performance
Abstract
We consider training probabilistic classifiers in the case of a large number of classes. The number of classes is assumed too large to perform exact normalisation over all classes. To account for this we consider a simple approach that directly approximates the likelihood. We show that this simple approach works well on toy problems and is competitive with recently introduced alternative non-likelihood based approximations. Furthermore, we relate this approach to a simple ranking objective. This leads us to suggest a specific setting for the optimal threshold in the ranking objective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation
