Beyond mAP: Towards better evaluation of instance segmentation
Rohit Jena, Lukas Zhornyak, Nehal Doiphode, Pratik Chaudhari, Vivek, Buch, James Gee, Jianbo Shi

TL;DR
This paper critiques the limitations of the commonly used Average Precision metric for instance segmentation, proposing new evaluation measures and a module to better handle duplicate predictions and improve model assessment.
Contribution
It introduces two novel metrics for explicitly measuring duplicate predictions and a Semantic Sorting and NMS module to reduce duplicates, enhancing evaluation accuracy.
Findings
Modern networks show high AP but many duplicates
Proposed metrics effectively quantify duplicate predictions
Semantic Sorting and NMS reduces false positives
Abstract
Correctness of instance segmentation constitutes counting the number of objects, correctly localizing all predictions and classifying each localized prediction. Average Precision is the de-facto metric used to measure all these constituents of segmentation. However, this metric does not penalize duplicate predictions in the high-recall range, and cannot distinguish instances that are localized correctly but categorized incorrectly. This weakness has inadvertently led to network designs that achieve significant gains in AP but also introduce a large number of false positives. We therefore cannot rely on AP to choose a model that provides an optimal tradeoff between false positives and high recall. To resolve this dilemma, we review alternative metrics in the literature and propose two new measures to explicitly measure the amount of both spatial and categorical duplicate predictions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning and Data Classification · Topic Modeling
