The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Xia Hu; Zhenrui Yue; Brian Potetz; Howard Zhou; Leonidas Guibas; Chun-Ta Lu; Zhicheng Wang

arXiv:2605.09883·cs.CV·May 12, 2026

The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Xia Hu, Zhenrui Yue, Brian Potetz, Howard Zhou, Leonidas Guibas, Chun-Ta Lu, Zhicheng Wang

PDF

TL;DR

This paper reveals that current multimodal large language models rely on a Cartesian shortcut in visual reasoning tasks, which can be broken by reformulating tasks in Polar coordinates, exposing their lack of topology-invariant reasoning.

Contribution

The authors introduce Polaris-Bench, a benchmark reformulating visual reasoning tasks in Polar coordinates to evaluate and expose the reliance of models on Cartesian shortcuts.

Findings

01

Models' performance drops from 70-83% to 31-39% on Polar layouts.

02

Reasoning improvements on Cartesian layouts do not transfer to Polar equivalents.

03

Current models lack topology-invariant visual reasoning capabilities.

Abstract

As current Multimodal Large Language Models rapidly saturate canonical visual reasoning benchmarks, a key question emerges: do these strong scores genuinely reflect robust visual understanding? We identify a pervasive vulnerability, the \textbf{Cartesian Shortcut}: visual reasoning benchmarks prevalently build on orthogonal grid-based layouts that can be readily discretized into explicit textual coordinates. Models systematically exploit this property, heavily leveraging text-based deductive reasoning to assist visual problem-solving. To systematically dismantle this shortcut, we introduce \textbf{Polaris-Bench}, which re-formulates 53 visual reasoning tasks in Polar coordinate space with paired Cartesian counterparts as reference, while preserving consistent logical constraints and task semantics -- thus fundamentally breaking the orthogonal prior that models exploit. Comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.