SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning

Tomohito Kawabata; Xinyu Zhang; and Ling Xiao

arXiv:2512.14757·cs.CV·December 18, 2025

SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning

Tomohito Kawabata, Xinyu Zhang, and Ling Xiao

PDF

Open Access

TL;DR

This paper introduces SocialNav-MoE, an efficient Mixture-of-Experts vision language model designed for socially compliant robot navigation, leveraging reinforcement fine-tuning and semantic similarity rewards to balance accuracy and computational efficiency.

Contribution

The paper proposes SocialNav-MoE, a novel mixture-of-experts model for socially aware navigation, with reinforcement fine-tuning and a new semantic similarity reward to improve decision-making.

Findings

01

SocialNav-MoE balances navigation accuracy and efficiency effectively.

02

Semantic similarity reward outperforms other reward types.

03

Small VLMs with specific routing strategies enhance real-time navigation.

Abstract

For robots navigating in human-populated environments, safety and social compliance are equally critical, yet prior work has mostly emphasized safety. Socially compliant navigation that accounts for human comfort, social norms, and contextual appropriateness remains underexplored. Vision language models (VLMs) show promise for this task; however, large-scale models incur substantial computational overhead, leading to higher inference latency and energy consumption, which makes them unsuitable for real-time deployment on resource-constrained robotic platforms. To address this issue, we investigate the effectiveness of small VLM and propose SocialNav-MoE, an efficient Mixture-of-Experts vision language model for socially compliant navigation with reinforcement fine-tuning (RFT). We further introduce a semantic similarity reward (SSR) to effectively leverage RFT for enhancing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Advanced Neural Network Applications