TraversalBench: Challenging Paths to Follow for Vision Language Models

Clara Petrova; Zhuo Chen; Marin Solja\v{c}i\'c

arXiv:2604.10999·cs.CV·April 14, 2026

TraversalBench: Challenging Paths to Follow for Vision Language Models

Clara Petrova, Zhuo Chen, Marin Solja\v{c}i\'c

PDF

TL;DR

TraversalBench is a new benchmark designed to evaluate vision-language models' ability to follow complex visual paths, highlighting the impact of self-intersections and confounding lines on performance.

Contribution

It introduces a controlled, diagnostic benchmark for assessing path-following visual reasoning in multimodal models, emphasizing structural factors and error localization.

Findings

01

Self-intersections are the main source of difficulty for models.

02

Performance drops sharply after the first crossing in the path.

03

Layouts favoring left-to-right reading order are more common but do not fully explain performance.

Abstract

Vision-language models (VLMs) perform strongly on many multimodal benchmarks. However, the ability to follow complex visual paths -- a task that human observers typically find straightforward -- remains under-tested. We introduce TraversalBench, a controlled benchmark for exact visual path traversal. Each instance contains a single continuous polyline, a unique start marker, and markers placed at path vertices; the task is to recover the exact ordered sequence encountered when traversing the path from start to finish. The benchmark explicitly balances key path-structural factors including self-intersection count, tortuosity, vertex count, and nearby confounding lines, while minimizing reliance on OCR, world knowledge, and open-ended planning. We find that self-intersections are the dominant source of difficulty. A first-crossing analysis shows that errors are sharply localized:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.