Visual Representation Learning Does Not Generalize Strongly Within the Same Domain
Lukas Schott, Julius von K\"ugelgen, Frederik Tr\"auble, Peter Gehler,, Chris Russell, Matthias Bethge, Bernhard Sch\"olkopf, Francesco Locatello,, Wieland Brendel

TL;DR
This study evaluates various representation learning methods on their ability to generalize underlying factors of variation across datasets, revealing significant limitations especially in real-world scenarios.
Contribution
The paper provides a comprehensive benchmark showing that current models fail to learn mechanisms of variation, highlighting a critical gap in generalization capabilities.
Findings
All models struggle to learn underlying mechanisms.
Generalization drops significantly on real-world datasets.
Models maintain in-distribution factor inference despite mechanism failure.
Abstract
An important component for generalization in machine learning is to uncover underlying latent factors of variation as well as the mechanism through which each factor acts in the world. In this paper, we test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets (dSprites, Shapes3D, MPI3D) from controlled environments, and on our contributed CelebGlow dataset. In contrast to prior robustness work that introduces novel factors of variation during test time, such as blur or other (un)structured noise, we here recompose, interpolate, or extrapolate only existing factors of variation from the training data set (e.g., small and medium-sized objects during training and large objects during testing). Models that learn the correct mechanism should be able to generalize to this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
