Benchmarking a Benchmark: How Reliable is MS-COCO?
Eric Zimmermann, Justin Szeto, Jerome Pasquero, Frederic Ratle

TL;DR
This paper critically examines the reliability of the MS-COCO dataset by re-annotating it as Sama-COCO, revealing biases and emphasizing the importance of annotation styles for task-specific model training and evaluation.
Contribution
It introduces Sama-COCO, a re-annotated version of MS-COCO, and analyzes how annotation styles influence model performance and bias detection.
Findings
Annotation styles significantly affect model evaluation.
Biases in datasets can impact learned representations.
Careful consideration of annotation pipelines is crucial for task relevance.
Abstract
Benchmark datasets are used to profile and compare algorithms across a variety of tasks, ranging from image classification to segmentation, and also play a large role in image pretraining algorithms. Emphasis is placed on results with little regard to the actual content within the dataset. It is important to question what kind of information is being learned from these datasets and what are the nuances and biases within them. In the following work, Sama-COCO, a re-annotation of MS-COCO, is used to discover potential biases by leveraging a shape analysis pipeline. A model is trained and evaluated on both datasets to examine the impact of different annotation conditions. Results demonstrate that annotation styles are important and that annotation pipelines should closely consider the task of interest. The dataset is made publicly available at https://www.sama.com/sama-coco-dataset/ .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging for Blood Diseases · AI in cancer detection · Image Retrieval and Classification Techniques
