Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan, Sidonie Lefebvre, Sylvie Le H\'egarat-Mascle, Arnaud Woiselle

TL;DR
This survey reviews self-supervised learning methods tailored for real-world object detection, emphasizing small object detection in complex environments, and compares their effectiveness across CNN and ViT architectures.
Contribution
It provides a detailed comparison of SSL strategies like instance discrimination and MIM for object detection, especially for small objects, and offers practical guidance for method selection.
Findings
Instance discrimination performs well with CNN encoders.
MIM methods are more effective with ViT architectures.
Pre-training on domain-specific data improves detection performance.
Abstract
Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
