TL;DR
Odin is a self-supervised learning framework that automatically discovers image structures like objects and segments, leading to improved transfer learning performance across various vision tasks without relying on manual annotations.
Contribution
The paper introduces Odin, a novel self-supervised approach that jointly learns object discovery and representation, eliminating the need for handcrafted image segmentations or specialized augmentations.
Findings
Achieves state-of-the-art transfer results on COCO, PASCAL, and Cityscapes.
Surpasses supervised pre-training for video segmentation on DAVIS.
Demonstrates robustness and generality over prior SSL methods.
Abstract
The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategies, these methods sacrifice the simplicity and generality that makes SSL so powerful. Instead, we propose a self-supervised learning paradigm that discovers this image structure by itself. Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision. The resulting learning paradigm is simpler, less brittle, and more general, and achieves state-of-the-art transfer learning results for object detection and instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
