MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments
Zhuonan Liu, Xinyu Zhang, Zishuo Wang, Tomohito Kawabata, Xuesu Xiao, Ling Xiao

TL;DR
MUSON is a new multimodal dataset designed for reasoning-based socially compliant navigation in urban environments, featuring explicit reasoning annotations and diverse scenes to improve safety-critical decision-making in autonomous systems.
Contribution
The paper introduces MUSON, a structured, reasoning-oriented dataset with explicit annotations, addressing limitations of previous datasets and enabling better learning of safety-critical behaviors.
Findings
Qwen2.5-VL-3B achieves 86.25% decision accuracy on MUSON.
MUSON provides consistent reasoning, action, and explanation annotations.
Benchmark results demonstrate MUSON's effectiveness as a navigation benchmark.
Abstract
Socially compliant navigation requires structured reasoning over dynamic pedestrians and physical constraints to ensure safe and interpretable decisions. However, existing social navigation datasets often lack explicit reasoning supervision and exhibit highly long-tailed action distributions, limiting models' ability to learn safety-critical behaviors. To address these issues, we introduce MUSON, a multimodal dataset for short-horizon social navigation collected across diverse indoor and outdoor campus scenes. MUSON adopts a structured five-step Chain-of-Thought annotation consisting of perception, prediction, reasoning, action, and explanation, with explicit modeling of static physical constraints and a rationally balanced discrete action space. Compared to SNEI, MUSON provides consistent reasoning, action, and explanation. Benchmarking multiple state-of-the-art Small Vision Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotics and Sensor-Based Localization
