Generalization and Robustness Implications in Object-Centric Learning
Andrea Dittadi, Samuele Papa, Michele De Vita, Bernhard Sch\"olkopf,, Ole Winther, Francesco Locatello

TL;DR
This paper evaluates how object-centric neural network models generalize and maintain robustness across various distribution shifts in multi-object scenes, highlighting their strengths and limitations.
Contribution
It provides a comprehensive empirical analysis of state-of-the-art object-centric models on multiple datasets, focusing on generalization and robustness under diverse distribution shifts.
Findings
Object-centric models improve downstream task performance.
Models show robustness to certain distribution shifts.
Robustness varies with the type of distribution shift.
Abstract
The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and performance of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. In addition, we study generalization and robustness by investigating the settings where either a single object is out of distribution -- e.g., having an unseen color, texture, or shape -- or global properties of the scene are altered -- e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
