I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers
Alexa R. Tartaglini, Michael A. Lepori

TL;DR
This paper investigates how vision transformers use Gestalt continuity principles for object binding, revealing specific attention heads that track continuity and contribute to object representation.
Contribution
It demonstrates that pretrained vision transformers rely on Gestalt continuity for object binding, identifying and ablating attention heads involved in this process.
Findings
Binding probes are sensitive to continuity in vision transformers.
Certain attention heads track continuity across datasets.
Ablating these heads affects object binding representations.
Abstract
Object binding is a foundational process in visual cognition, during which low-level perceptual features are joined into object representations. Binding has been considered a fundamental challenge for neural networks, and a major milestone on the way to artificial models with flexible visual intelligence. Recently, several investigations have demonstrated evidence that binding mechanisms emerge in pretrained vision models, enabling them to associate portions of an image that contain an object. The question remains: how are these models binding objects together? In this work, we investigate whether vision models rely on the principle of Gestalt continuity to perform object binding, over and above other principles like similarity and proximity. Using synthetic datasets, we demonstrate that binding probes are sensitive to continuity across a wide range of pretrained vision transformers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
