Evaluating VLMs' Spatial Reasoning Over Robot Motion: A Step Towards Robot Planning with Motion Preferences

Wenxi Wu; Jingjing Zhang; Martim Brand\~ao

arXiv:2603.13100·cs.RO·March 16, 2026

Evaluating VLMs' Spatial Reasoning Over Robot Motion: A Step Towards Robot Planning with Motion Preferences

Wenxi Wu, Jingjing Zhang, Martim Brand\~ao

PDF

Open Access

TL;DR

This paper assesses the spatial reasoning abilities of state-of-the-art Vision-Language Models in robot motion planning, highlighting their potential and limitations in understanding user preferences and constraints in a zero-shot and fine-tuned setting.

Contribution

It provides a systematic evaluation of VLMs' spatial reasoning in robot motion tasks, introducing querying methods and analyzing performance trade-offs.

Findings

01

Qwen2.5-VL achieves 71.4% zero-shot accuracy

02

Fine-tuning improves accuracy to 75%

03

GPT-4o performs less effectively

Abstract

Understanding user instructions and object spatial relations in surrounding environments is crucial for intelligent robot systems to assist humans in various tasks. The natural language and spatial reasoning capabilities of Vision-Language Models (VLMs) have the potential to enhance the generalization of robot planners on new tasks, objects, and motion specifications. While foundation models have been applied to task planning, it is still unclear the degree to which they have the capability of spatial reasoning required to enforce user preferences or constraints on motion, such as desired distances from objects, topological properties, or motion style preferences. In this paper, we evaluate the capability of four state-of-the-art VLMs at spatial reasoning over robot motion, using four different querying methods. Our results show that, with the highest-performing querying method,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Robotic Path Planning Algorithms