Human-Like Coarse Object Representations in Vision Models
Andrey Gizdov, Andrea Procopio, Yichen Li, Daniel Harari, Tomer Ullman

TL;DR
This study investigates whether vision models develop human-like coarse object representations that balance detail and physical prediction, finding that intermediate models best match human perception.
Contribution
The paper introduces a comparison pipeline and alignment metric to analyze how different vision models develop human-like object bodies under various training conditions.
Findings
Intermediate models best align with human behavior
Resource constraints lead to human-like coarse representations
Model size and training time influence segmentation granularity
Abstract
Humans appear to represent objects for intuitive physics with coarse, volumetric bodies'' that smooth concavities - trading fine visual details for efficient physical predictions - yet their internal structure is largely unknown. Segmentation models, in contrast, optimize pixel-accurate masks that may misalign with such bodies. We ask whether and when these models nonetheless acquire human-like bodies. Using a time-to-collision (TTC) behavioral paradigm, we introduce a comparison pipeline and alignment metric, then vary model training time, size, and effective capacity via pruning. Across all manipulations, alignment with human behavior follows an inverse U-shaped curve: small/briefly trained/pruned models under-segment into blobs; large/fully trained models over-segment with boundary wiggles; and an intermediate ideal body granularity'' best matches humans. This suggests human-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
