Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns
Min Je Kim, Muhammad Munsif, Altaf Hussain, Hikmat Yar, Sung Wook Baik

TL;DR
This paper introduces MJ-COCO, a refined version of MS-COCO, created through a pseudo-labeling process that corrects annotation errors, leading to improved object detection model performance and increased annotation coverage.
Contribution
The paper presents a novel, scalable pseudo-labeling framework for refining large-scale object detection datasets, significantly improving annotation quality without manual re-labeling.
Findings
Models trained on MJ-COCO outperform those trained on MS-COCO in AP and APS metrics.
MJ-COCO has over 200,000 more small object annotations than MS-COCO.
The refinement process effectively reduces annotation errors and enhances dataset reliability.
Abstract
Benchmark object detection (OD) datasets play a pivotal role in advancing computer vision applications such as autonomous driving, and surveillance, as well as in training and evaluating deep learning-based state-of-the-art detection models. Among them, MS-COCO has become a standard benchmark due to its diverse object categories and complex scenes. However, despite its wide adoption, MS-COCO suffers from various annotation issues, including missing labels, incorrect class assignments, inaccurate bounding boxes, duplicate labels, and group labeling inconsistencies. These errors not only hinder model training but also degrade the reliability and generalization of OD models. To address these challenges, we propose a comprehensive refinement framework and present MJ-COCO, a newly re-annotated version of MS-COCO. Our approach begins with loss and gradient-based error detection to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
