Bridging the Gap to Real-World Object-Centric Learning

Maximilian Seitzer; Max Horn; Andrii Zadaianchuk; Dominik Zietlow,; Tianjun Xiao; Carl-Johann Simon-Gabriel; Tong He; Zheng Zhang; Bernhard; Sch\"olkopf; Thomas Brox; Francesco Locatello

arXiv:2209.14860·cs.CV·March 8, 2023·31 cites

Bridging the Gap to Real-World Object-Centric Learning

Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow,, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard, Sch\"olkopf, Thomas Brox, Francesco Locatello

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces DINOSAUR, an unsupervised object-centric learning model that reconstructs features from self-supervised trained models, enabling it to scale from simulated to real-world datasets like COCO and PASCAL VOC.

Contribution

DINOSAUR demonstrates that feature reconstruction from self-supervised models is sufficient for unsupervised object-centric learning, surpassing existing methods on real-world datasets.

Findings

01

Outperforms existing models on simulated data

02

First to scale to real-world datasets like COCO and PASCAL VOC

03

Achieves competitive performance with more complex pipelines

Abstract

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Bridging the Gap to Real-World Object-Centric Learning· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning