Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts
Zekun Wang, Sashank Varma

TL;DR
This study evaluates how different computer vision models recognize geometric and topological concepts, finding that transformer models align closely with human performance and suggesting that these concepts can be learned through interaction with the environment.
Contribution
It demonstrates that transformer-based vision models outperform others in recognizing GT concepts and align with human sensitivity, supporting the learning account over innate core knowledge.
Findings
Transformers outperform CNNs and vision-language models in GT tasks.
Transformers show strong alignment with children's performance.
Vision-language models underperform and deviate from human profiles.
Abstract
With the rapid improvement of machine learning (ML) models, cognitive scientists are increasingly asking about their alignment with how humans think. Here, we ask this question for computer vision models and human sensitivity to geometric and topological (GT) concepts. Under the core knowledge account, these concepts are innate and supported by dedicated neural circuitry. In this work, we investigate an alternative explanation, that GT concepts are learned ``for free'' through everyday interaction with the environment. We do so using computer visions models, which are trained on large image datasets. We build on prior studies to investigate the overall performance and human alignment of three classes of models -- convolutional neural networks (CNNs), transformer-based models, and vision-language models -- on an odd-one-out task testing 43 GT concepts spanning seven classes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild and Animal Learning Development · Face Recognition and Perception · Action Observation and Synchronization
