Optimizing F-measure: A Tale of Two Approaches

Ye Nan (NUS); Kian Ming Chai (DSO National Laboratories); Wee Sun Lee; (NUS); Hai Leong Chieu (DSO National Laboratories)

arXiv:1206.4625·cs.LG·June 22, 2012·86 cites

Optimizing F-measure: A Tale of Two Approaches

Ye Nan (NUS), Kian Ming Chai (DSO National Laboratories), Wee Sun Lee, (NUS), Hai Leong Chieu (DSO National Laboratories)

PDF

Open Access

TL;DR

This paper compares two main approaches for optimizing F-measure in imbalanced data classification, analyzing their theoretical foundations and practical performance on synthetic and real datasets.

Contribution

It provides a theoretical analysis of the empirical utility maximization and decision-theoretic approaches, highlighting their conditions for preference and robustness.

Findings

01

Both approaches are asymptotically equivalent with large data.

02

EUM is more robust to model misspecification.

03

Decision-theoretic approach better handles rare classes and domain adaptation.

Abstract

F-measures are popular performance metrics, particularly for tasks with imbalanced data sets. Algorithms for learning to maximize F-measures follow two approaches: the empirical utility maximization (EUM) approach learns a classifier having optimal performance on training data, while the decision-theoretic approach learns a probabilistic model and then predicts labels with maximum expected F-measure. In this paper, we investigate the theoretical justifications and connections for these two approaches, and we study the conditions under which one approach is preferable to the other using synthetic and real datasets. Given accurate models, our results suggest that the two approaches are asymptotically equivalent given large training and test sets. Nevertheless, empirically, the EUM approach appears to be more robust against model misspecification, and given a good model, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Text and Document Classification Technologies · Machine Learning and Data Classification