Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel, Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, and Xuesu Xiao

TL;DR
This paper introduces Social-LLaVA, a vision-language model trained on a new dataset, enabling robots to reason socially in dynamic environments using human-like language-based understanding.
Contribution
The paper presents a novel dataset and a fine-tuned vision-language model that improves socially aware robot navigation through language reasoning.
Findings
Social-LLaVA outperforms GPT-4V and Gemini in VQA tasks.
The dataset contains 40K annotated interactions in crowded spaces.
Deployment demonstrates human-like reasoning in real robot navigation.
Abstract
Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Robotics and Automated Systems
MethodsAttentive Walk-Aggregating Graph Neural Network
