Does Data Scaling Lead to Visual Compositional Generalization?

Arnas Uselis; Andrea Dittadi; Seong Joon Oh

arXiv:2507.07102·cs.LG·July 10, 2025

Does Data Scaling Lead to Visual Compositional Generalization?

Arnas Uselis, Andrea Dittadi, Seong Joon Oh

PDF

Open Access

TL;DR

This paper investigates whether data scaling improves compositional generalization in vision models, finding that data diversity, not scale, is key to learning compositional structures that enable efficient generalization.

Contribution

It demonstrates that compositional generalization depends on data diversity and concept coverage, and that a linearly factored representational structure underpins efficient compositional learning.

Findings

01

Data diversity drives compositional generalization.

02

Increased combinatorial coverage induces a factored representational structure.

03

Pretrained models show partial evidence of this structure.

Abstract

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeochemistry and Geologic Mapping · Topological and Geometric Data Analysis · Domain Adaptation and Few-Shot Learning