Optimizing F-measure: A Tale of Two Approaches
Ye Nan (NUS), Kian Ming Chai (DSO National Laboratories), Wee Sun Lee, (NUS), Hai Leong Chieu (DSO National Laboratories)

TL;DR
This paper compares two main approaches for optimizing F-measure in imbalanced data classification, analyzing their theoretical foundations and practical performance on synthetic and real datasets.
Contribution
It provides a theoretical analysis of the empirical utility maximization and decision-theoretic approaches, highlighting their conditions for preference and robustness.
Findings
Both approaches are asymptotically equivalent with large data.
EUM is more robust to model misspecification.
Decision-theoretic approach better handles rare classes and domain adaptation.
Abstract
F-measures are popular performance metrics, particularly for tasks with imbalanced data sets. Algorithms for learning to maximize F-measures follow two approaches: the empirical utility maximization (EUM) approach learns a classifier having optimal performance on training data, while the decision-theoretic approach learns a probabilistic model and then predicts labels with maximum expected F-measure. In this paper, we investigate the theoretical justifications and connections for these two approaches, and we study the conditions under which one approach is preferable to the other using synthetic and real datasets. Given accurate models, our results suggest that the two approaches are asymptotically equivalent given large training and test sets. Nevertheless, empirically, the EUM approach appears to be more robust against model misspecification, and given a good model, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Text and Document Classification Technologies · Machine Learning and Data Classification
