Towards Self-Supervised Learning of Global and Object-Centric Representations
Federico Baldassarre, Hossein Azizpour

TL;DR
This paper explores self-supervised learning for structured object-centric representations in multi-entity scenes, emphasizing attention mechanisms and contrastive losses, validated through experiments on the CLEVR dataset.
Contribution
It demonstrates the effectiveness of attention-based competition and contrastive losses in learning object-centric representations without pixel reconstruction.
Findings
Attention competition is crucial for object discovery.
Contrastive losses can be applied in latent space effectively.
Careful data augmentation is needed to handle false negatives and positives.
Abstract
Self-supervision allows learning meaningful representations of natural images, which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset. Regarding the architecture, we confirm the importance of competition for attention-based object discovery, where each image patch is exclusively attended by one object. For training, we show that contrastive losses equipped with matching can be applied directly in a latent space, avoiding pixel-based reconstruction. However, such an optimization objective is sensitive to false negatives (recurring objects) and false positives (matching errors). Careful consideration is thus required around data augmentation and negative sample selection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
