Hierarchical Scoring for Machine Learning Classifier Error Impact Evaluation
Erin Lanus, Daniel Wolodkin, and Laura J. Freeman

TL;DR
This paper introduces hierarchical scoring metrics for machine learning classifiers that provide nuanced evaluation by considering the relationships between class labels, enabling a more detailed understanding of error impact beyond simple pass/fail metrics.
Contribution
The work develops and demonstrates hierarchical scoring metrics using scoring trees to encode class relationships, offering a more granular evaluation of model errors.
Findings
Hierarchical metrics capture error impact with finer granularity.
Scoring trees enable tuning of error evaluation strategies.
Metrics reflect the distance between predicted and true labels in a hierarchy.
Abstract
A common use of machine learning (ML) models is predicting the class of a sample. Object detection is an extension of classification that includes localization of the object via a bounding box within the sample. Classification, and by extension object detection, is typically evaluated by counting a prediction as incorrect if the predicted label does not match the ground truth label. This pass/fail scoring treats all misclassifications as equivalent. In many cases, class labels can be organized into a class taxonomy with a hierarchical structure to either reflect relationships among the data or operator valuation of misclassifications. When such a hierarchical structure exists, hierarchical scoring metrics can return the model performance of a given prediction related to the distance between the prediction and the ground truth label. Such metrics can be viewed as giving partial credit to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
