Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

Jian Shi; Michael Birsak; Wenqing Cui; Zhenyu Li; Peter Wonka

arXiv:2601.22231·cs.CV·February 2, 2026

Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

Jian Shi, Michael Birsak, Wenqing Cui, Zhenyu Li, Peter Wonka

PDF

Open Access

TL;DR

This paper investigates the role of positional embeddings in vision transformers, revealing they act as geometric priors that influence spatial reasoning, with experiments showing their impact on multi-view geometric consistency.

Contribution

It introduces token-level diagnostics and provides extensive analysis on how positional embeddings affect geometric structure in ViT representations.

Findings

01

Positional embeddings serve as geometric priors in ViTs.

02

They influence multi-view geometric consistency.

03

Positional embeddings can both help and hinder spatial reasoning.

Abstract

This paper revisits the role of positional embeddings (PEs) within vision transformers (ViTs) from a geometric perspective. We show that PEs are not mere token indices but effectively function as geometric priors that shape the spatial structure of the representation. We introduce token-level diagnostics that measure how multi-view geometric consistency in ViT representation depends on consitent PEs. Through extensive experiments on 14 foundation ViT models, we reveal how PEs influence multi-view geometry and spatial reasoning. Our findings clarify the role of PEs as a causal mechanism that governs spatial structure in ViT representations. Our code is provided in https://github.com/shijianjian/vit-geometry-probes

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpatial Cognition and Navigation · Action Observation and Synchronization · Visual perception and processing mechanisms