Self-Supervised Learning for Real-World Object Detection: a Survey

Alina Ciocarlan; Sidonie Lefebvre; Sylvie Le H\'egarat-Mascle; Arnaud Woiselle

arXiv:2410.07442·cs.CV·May 1, 2026

Self-Supervised Learning for Real-World Object Detection: a Survey

Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le H\'egarat-Mascle, Arnaud Woiselle

PDF

TL;DR

This survey reviews self-supervised learning methods tailored for real-world object detection, emphasizing small object detection in complex environments, and compares their effectiveness across CNN and ViT architectures.

Contribution

It provides a detailed comparison of SSL strategies like instance discrimination and MIM for object detection, especially for small objects, and offers practical guidance for method selection.

Findings

01

Instance discrimination performs well with CNN encoders.

02

MIM methods are more effective with ViT architectures.

03

Pre-training on domain-specific data improves detection performance.

Abstract

Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.