From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection
Zilin Fang, Anxing Xiao, David Hsu, and Gim Hee Lee

TL;DR
This paper introduces a social robot navigation system that combines geometric path planning with vision-language model-based social reasoning, enabling real-time, socially-aware navigation in human environments.
Contribution
It presents a novel framework integrating VLM-based social reasoning with geometric planning for improved socially compliant robot navigation.
Findings
Achieves lowest personal space violation duration
Minimizes pedestrian-facing time
Prevents social zone intrusions
Abstract
Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotic Path Planning Algorithms
