Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models

Chengcheng Wang; Jianyuan Guo; Hongguang Li; Yuchuan Tian; Ying Nie; Chang Xu; Kai Han

arXiv:2505.16416·cs.CV·May 22, 2026

Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models

Chengcheng Wang, Jianyuan Guo, Hongguang Li, Yuchuan Tian, Ying Nie, Chang Xu, Kai Han

PDF

1 Repo

TL;DR

Circle-RoPE introduces a novel cone-like positional embedding for vision-language models, effectively disentangling cross-modal positions and improving spatial reasoning without sacrificing intra-image structure.

Contribution

It proposes Circle-RoPE and AGE, new geometric encoding methods that enhance cross-modal positional disentanglement in large vision-language models.

Findings

01

Consistent improvements in spatial grounding across benchmarks.

02

Enhanced visual reasoning performance.

03

Effective elimination of geometric attention bias.

Abstract

Rotary Position Embedding (RoPE) is widely adopted in large language models, but when applied to vision-language models (VLMs) it couples text and image position indices and can introduce spurious cross-modal relative-position bias. We propose Per-Token Distance (PTD) to quantify cross-modal positional disentanglement, and prove that PTD = 0 is a sufficient condition to eliminate the geometric attention bias induced by RoPE. Guided by this criterion, we introduce Circle-RoPE, which remaps 2D image-token coordinates onto an annulus orthogonal to the text position axis, yielding a cone-like geometry where each text token is equidistant to all image tokens while preserving intra-image spatial structure. We further propose Alternating Geometry Encoding (AGE) to combine complementary geometric priors by alternating the decoupled geometry of Circle-RoPE and the grid-based prior of standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lose4578/CircleRoPE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques