From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models
Tianqin Li, Ziqi Wen, Leiran Song, Jun Liu, Zhi Jing, Tai Sing Lee

TL;DR
This paper demonstrates that self-supervised vision models, especially Vision Transformers trained with Masked Autoencoding, develop Gestalt-like global perceptual organization, and introduces a diagnostic test to evaluate this ability.
Contribution
The study reveals how specific training methods induce Gestalt principles in vision models and introduces DiSRT to assess global structure sensitivity.
Findings
ViTs with MAE show Gestalt law-like activation patterns.
Self-supervised models outperform supervised ones on DiSRT.
Sparsity mechanisms can restore global sensitivity in models.
Abstract
Human vision organizes local cues into coherent global forms using Gestalt principles like closure, proximity, and figure-ground assignment -- functions reliant on global spatial structure. We investigate whether modern vision models show similar behaviors, and under what training conditions these emerge. We find that Vision Transformers (ViTs) trained with Masked Autoencoding (MAE) exhibit activation patterns consistent with Gestalt laws, including illusory contour completion, convexity preference, and dynamic figure-ground segregation. To probe the computational basis, we hypothesize that modeling global dependencies is necessary for Gestalt-like organization. We introduce the Distorted Spatial Relationship Testbench (DiSRT), which evaluates sensitivity to global spatial perturbations while preserving local textures. Using DiSRT, we show that self-supervised models (e.g., MAE, CLIP)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild Therapy and Development · Creativity in Education and Neuroscience · Education and Technology Integration
