Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks
Rouhollah Ahmadian, Mehdi Ghatee, Johan Wahlstr\"om

TL;DR
This paper proposes novel scoring rules, PBS and PLL, that improve probabilistic model evaluation by better rewarding correct classifications, leading to enhanced model selection and higher F1 scores.
Contribution
Introduction of Penalized Brier Score and Penalized Logarithmic Loss, novel proper scoring rules that incorporate penalties for misclassifications to improve model evaluation.
Findings
PBS correlates better with F1 score than Brier Score.
Models selected by PBS and PLL achieve higher F1 scores.
PBS improves early stopping and checkpointing decisions.
Abstract
This study introduces novel superior scoring rules called Penalized Brier Score (PBS) and Penalized Logarithmic Loss (PLL) to improve model evaluation for probabilistic classification. Traditional scoring rules like Brier Score and Logarithmic Loss sometimes assign better scores to misclassifications in comparison with correct classifications. This discrepancy from the actual preference for rewarding correct classifications can lead to suboptimal model selection. By integrating penalties for misclassifications, PBS and PLL modify traditional proper scoring rules to consistently assign better scores to correct predictions. Formal proofs demonstrate that PBS and PLL satisfy strictly proper scoring rule properties while also preferentially rewarding accurate classifications. Experiments showcase the benefits of using PBS and PLL for model selection, model checkpointing, and early stopping.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Text and Document Classification Technologies
MethodsGradient Checkpointing · Early Stopping
