Characterizing Universal Object Representations Across Vision Models
Florian P. Mahner, Johannes Roth, Ka Chun Lam, Michael F. Bonner, Francisco Pereira, and Martin N. Hebart

TL;DR
This study decomposes and analyzes the visual representations of 162 diverse deep neural networks to identify universal, interpretable dimensions that align with biological vision and are unaffected by model-specific factors.
Contribution
It introduces a method to identify universal object representation dimensions across models and links these to biological vision and semantic properties.
Findings
Universal dimensions are more interpretable and driven by conceptual image properties.
Differences in architecture, training data, or size do not explain universality.
Models with more universal dimensions better predict biological visual responses.
Abstract
Deep neural networks trained with different architectures, objectives, and datasets have been reported to converge on similar visual representations. However, what remains unknown is which visual properties models actually converge on and which factors may underlie this convergence. To address this, we decompose the object similarity structure of 162 diverse vision models into a small set of non-negative dimensions. To determine universal versus model-specific dimensions, we then estimate how often each dimension reappears across models. In contrast to model-specific dimensions, universal dimensions are more interpretable and more strongly driven by conceptual image properties, indicating the relevance of interpretability and semantic content as implicit factors driving universality across models. Differences in architecture, objective function, training data, model size, and model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
