Self-EMD: Self-Supervised Object Detection without ImageNet

Songtao Liu; Zeming Li; Jian Sun

arXiv:2011.13677·cs.CV·March 23, 2021·67 cites

Self-EMD: Self-Supervised Object Detection without ImageNet

Songtao Liu, Zeming Li, Jian Sun

PDF

Open Access

TL;DR

Self-EMD introduces a self-supervised object detection approach trained directly on unlabeled datasets like COCO, utilizing Earth Mover's Distance to preserve spatial features, achieving competitive results without ImageNet pretraining.

Contribution

The paper presents a novel self-supervised learning method for object detection that operates on non-iconic datasets and maintains spatial information using Earth Mover's Distance.

Findings

01

Achieves 39.8% mAP on COCO with ResNet50-FPN baseline.

02

Improves to 40.4% mAP with additional unlabeled data.

03

Comparable to state-of-the-art methods pre-trained on ImageNet.

Abstract

In this paper, we propose a novel self-supervised representation learning method, Self-EMD, for object detection. Our method directly trained on unlabeled non-iconic image dataset like COCO, instead of commonly used iconic-object image dataset like ImageNet. We keep the convolutional feature maps as the image embedding to preserve spatial structures and adopt Earth Mover's Distance (EMD) to compute the similarity between two embeddings. Our Faster R-CNN (ResNet50-FPN) baseline achieves 39.8% mAP on COCO, which is on par with the state of the art self-supervised methods pre-trained on ImageNet. More importantly, it can be further improved to 40.4% mAP with more unlabeled images, showing its great potential for leveraging more easily obtained unlabeled data. Code will be made available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsSoftmax · RoIPool · Region Proposal Network · Convolution · Faster R-CNN