From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models

Tianqin Li; Ziqi Wen; Leiran Song; Jun Liu; Zhi Jing; Tai Sing Lee

arXiv:2506.00718·cs.CV·June 3, 2025

From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models

Tianqin Li, Ziqi Wen, Leiran Song, Jun Liu, Zhi Jing, Tai Sing Lee

PDF

Open Access

TL;DR

This paper demonstrates that self-supervised vision models, especially Vision Transformers trained with Masked Autoencoding, develop Gestalt-like global perceptual organization, and introduces a diagnostic test to evaluate this ability.

Contribution

The study reveals how specific training methods induce Gestalt principles in vision models and introduces DiSRT to assess global structure sensitivity.

Findings

01

ViTs with MAE show Gestalt law-like activation patterns.

02

Self-supervised models outperform supervised ones on DiSRT.

03

Sparsity mechanisms can restore global sensitivity in models.

Abstract

Human vision organizes local cues into coherent global forms using Gestalt principles like closure, proximity, and figure-ground assignment -- functions reliant on global spatial structure. We investigate whether modern vision models show similar behaviors, and under what training conditions these emerge. We find that Vision Transformers (ViTs) trained with Masked Autoencoding (MAE) exhibit activation patterns consistent with Gestalt laws, including illusory contour completion, convexity preference, and dynamic figure-ground segregation. To probe the computational basis, we hypothesize that modeling global dependencies is necessary for Gestalt-like organization. We introduce the Distorted Spatial Relationship Testbench (DiSRT), which evaluates sensitivity to global spatial perturbations while preserving local textures. Using DiSRT, we show that self-supervised models (e.g., MAE, CLIP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild Therapy and Development · Creativity in Education and Neuroscience · Education and Technology Integration