VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models
Daeun Song, Jing Liang, Amirreza Payandeh, Amir Hossain Raj, Xuesu, Xiao, Dinesh Manocha

TL;DR
VLM-Social-Nav introduces a vision-language model-based approach for socially aware robot navigation, improving success and collision rates while ensuring human-friendly behavior in real-world environments.
Contribution
The paper presents a novel VLM-based scoring method for socially compliant robot navigation, reducing training data reliance and improving adaptability.
Findings
27.38% improvement in success rate
19.05% reduction in collision rate
Most socially compliant behavior in user study
Abstract
We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large training datasets and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robotics and Automated Systems
