NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

Tim Windecker; Manthan Patel; Moritz Reuss; Richard Schwarzkopf; Cesar Cadena; Rudolf Lioutikov; Marco Hutter; Jonas Frey

arXiv:2510.26909·cs.RO·March 10, 2026

NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

Tim Windecker, Manthan Patel, Moritz Reuss, Richard Schwarzkopf, Cesar Cadena, Rudolf Lioutikov, Marco Hutter, Jonas Frey

PDF

Open Access 1 Datasets

TL;DR

NaviTrace is a new benchmark for evaluating vision-language models' robotic navigation capabilities across diverse scenarios, highlighting current gaps in spatial grounding and goal localization compared to humans.

Contribution

Introduces NaviTrace, a comprehensive benchmark with a novel semantic-aware trace score for assessing embodied navigation of vision-language models.

Findings

01

Eight state-of-the-art VLMs show significant performance gaps to humans.

02

The benchmark reveals issues with spatial grounding and goal localization.

03

NaviTrace provides a scalable, reproducible platform for future research.

Abstract

Vision-language models demonstrate unprecedented performance and generalization across a wide range of tasks and scenarios. Integrating these foundation models into robotic navigation systems opens pathways toward building general-purpose robots. Yet, evaluating these models' navigation capabilities remains constrained by costly real-world trials, overly simplified simulations, and limited benchmarks. We introduce NaviTrace, a high-quality Visual Question Answering benchmark where a model receives an instruction and embodiment type (human, legged robot, wheeled robot, bicycle) and must output a 2D navigation trace in image space. Across 1000 scenarios and more than 3000 expert traces, we systematically evaluate eight state-of-the-art VLMs using a newly introduced semantic-aware trace score. This metric combines Dynamic Time Warping distance, goal endpoint error, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

leggedrobotics/navitrace
dataset· 443 dl
443 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Advanced Neural Network Applications