SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning
Tomohito Kawabata, Xinyu Zhang, and Ling Xiao

TL;DR
This paper introduces SocialNav-MoE, an efficient Mixture-of-Experts vision language model designed for socially compliant robot navigation, leveraging reinforcement fine-tuning and semantic similarity rewards to balance accuracy and computational efficiency.
Contribution
The paper proposes SocialNav-MoE, a novel mixture-of-experts model for socially aware navigation, with reinforcement fine-tuning and a new semantic similarity reward to improve decision-making.
Findings
SocialNav-MoE balances navigation accuracy and efficiency effectively.
Semantic similarity reward outperforms other reward types.
Small VLMs with specific routing strategies enhance real-time navigation.
Abstract
For robots navigating in human-populated environments, safety and social compliance are equally critical, yet prior work has mostly emphasized safety. Socially compliant navigation that accounts for human comfort, social norms, and contextual appropriateness remains underexplored. Vision language models (VLMs) show promise for this task; however, large-scale models incur substantial computational overhead, leading to higher inference latency and energy consumption, which makes them unsuitable for real-time deployment on resource-constrained robotic platforms. To address this issue, we investigate the effectiveness of small VLM and propose SocialNav-MoE, an efficient Mixture-of-Experts vision language model for socially compliant navigation with reinforcement fine-tuning (RFT). We further introduce a semantic similarity reward (SSR) to effectively leverage RFT for enhancing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Advanced Neural Network Applications
