Probing Prompt Design for Socially Compliant Robot Navigation with Vision Language Models
Ling Xiao, Toshihiko Yamasaki

TL;DR
This paper investigates how prompt design influences socially compliant robot navigation using vision language models, highlighting the importance of competition framing and prompt types for optimal performance.
Contribution
It introduces a systematic study of prompt design dimensions and their effects on navigation performance, emphasizing decision-level constraints over representational improvements.
Findings
Competition against humans yields best results for non-finetuned models.
Inappropriate prompts can significantly degrade performance.
Prompt design impacts action accuracy more than semantic understanding.
Abstract
Language models are increasingly used for social robot navigation, yet existing benchmarks largely overlook principled prompt design for socially compliant behavior. This limitation is particularly relevant in practice, as many systems rely on small vision language models (VLMs) for efficiency. Compared to large language models, small VLMs exhibit weaker decision-making capabilities, making effective prompt design critical for accurate navigation. Inspired by cognitive theories of human learning and motivation, we study prompt design along two dimensions: system guidance (action-focused, reasoning-oriented, and perception-reasoning prompts) and motivational framing, where models compete against humans, other AI systems, or their past selves. Experiments on two socially compliant navigation datasets reveal three key findings. First, for non-finetuned GPT-4o, competition against humans…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Action Observation and Synchronization
