Self-Supervised YOLO: Leveraging Contrastive Learning for Label-Efficient Object Detection
Manikanta Kotthapalli, Reshma Bhatia, Nainsi Jain

TL;DR
This paper demonstrates that contrastive self-supervised learning can effectively pretrain YOLO object detectors on unlabeled data, significantly reducing the need for labeled datasets while maintaining high detection performance.
Contribution
It introduces a contrastive SSL pretraining pipeline for YOLO backbones, improving label efficiency and detection accuracy in low-label scenarios.
Findings
SSL pretraining improves mAP and convergence speed
Pretrained YOLO models outperform supervised ones without labels
Effective use of unlabeled data for scalable object detection
Abstract
One-stage object detectors such as the YOLO family achieve state-of-the-art performance in real-time vision applications but remain heavily reliant on large-scale labeled datasets for training. In this work, we present a systematic study of contrastive self-supervised learning (SSL) as a means to reduce this dependency by pretraining YOLOv5 and YOLOv8 backbones on unlabeled images using the SimCLR framework. Our approach introduces a simple yet effective pipeline that adapts YOLO's convolutional backbones as encoders, employs global pooling and projection heads, and optimizes a contrastive loss using augmentations of the COCO unlabeled dataset (120k images). The pretrained backbones are then fine-tuned on a cyclist detection task with limited labeled data. Experimental results show that SSL pretraining leads to consistently higher mAP, faster convergence, and improved precision-recall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
