Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

Min Je Kim; Muhammad Munsif; Altaf Hussain; Hikmat Yar; Sung Wook Baik

arXiv:2506.00997·cs.CV·June 3, 2025

Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

Min Je Kim, Muhammad Munsif, Altaf Hussain, Hikmat Yar, Sung Wook Baik

PDF

Open Access

TL;DR

This paper introduces MJ-COCO, a refined version of MS-COCO, created through a pseudo-labeling process that corrects annotation errors, leading to improved object detection model performance and increased annotation coverage.

Contribution

The paper presents a novel, scalable pseudo-labeling framework for refining large-scale object detection datasets, significantly improving annotation quality without manual re-labeling.

Findings

01

Models trained on MJ-COCO outperform those trained on MS-COCO in AP and APS metrics.

02

MJ-COCO has over 200,000 more small object annotations than MS-COCO.

03

The refinement process effectively reduces annotation errors and enhances dataset reliability.

Abstract

Benchmark object detection (OD) datasets play a pivotal role in advancing computer vision applications such as autonomous driving, and surveillance, as well as in training and evaluating deep learning-based state-of-the-art detection models. Among them, MS-COCO has become a standard benchmark due to its diverse object categories and complex scenes. However, despite its wide adoption, MS-COCO suffers from various annotation issues, including missing labels, incorrect class assignments, inaccurate bounding boxes, duplicate labels, and group labeling inconsistencies. These errors not only hinder model training but also degrade the reliability and generalization of OD models. To address these challenges, we propose a comprehensive refinement framework and present MJ-COCO, a newly re-annotated version of MS-COCO. Our approach begins with loss and gradient-based error detection to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification