Thresholding Classifiers to Maximize F1 Score
Zachary Chase Lipton, Charles Elkan, Balakrishnan Narayanaswamy

TL;DR
This paper analyzes how to choose optimal decision thresholds for classifiers to maximize F1 scores in binary and multilabel settings, providing formulas and insights for well-calibrated and uninformative classifiers, with a case study on medical document labeling.
Contribution
It derives the relationship between classifier thresholds and maximum achievable F1 scores, offering practical guidelines for threshold selection in various classification scenarios.
Findings
Optimal threshold is half the F1 score for well-calibrated probabilities.
Classifiers with no information should classify all as positive.
Applying threshold optimization can yield surprising results in multilabel classification.
Abstract
This paper provides new insight into maximizing F1 scores in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, F1 score is widely used to measure the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 scores are used in multilabel classification. For any classifier that produces a real-valued output, we derive the relationship between the best achievable F1 score and the decision-making threshold that achieves this optimum. As a special case, if the classifier outputs are well-calibrated conditional probabilities, then the optimal threshold is half the optimal F1 score. As another special case, if the classifier is completely uninformative, then the optimal behavior is to classify all examples as positive. Since the actual prevalence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
