FSUNav: A Cerebrum-Cerebellum Architecture for Fast, Safe, and Universal Zero-Shot Goal-Oriented Navigation
Mingao Tan, Yiyang Li, Shanze Wang, Xinming Zhang, Wei Zhang

TL;DR
FSUNav introduces a novel cerebrum-cerebellum architecture that integrates vision-language models for fast, safe, and universal zero-shot goal-oriented navigation across diverse robotic platforms, enhancing generalization and real-time performance.
Contribution
The paper proposes a new cerebrum-cerebellum architecture that combines deep reinforcement learning and vision-language models for improved zero-shot navigation.
Findings
Achieves state-of-the-art results on MP3D, HM3D, and OVON benchmarks.
Demonstrates robust real-world deployment on various robotic platforms.
Significantly outperforms existing navigation methods in safety and efficiency.
Abstract
Current vision-language navigation methods face substantial bottlenecks regarding heterogeneous robot compatibility, real-time performance, and navigation safety. Furthermore, they struggle to support open-vocabulary semantic generalization and multimodal task inputs. To address these challenges, this paper proposes FSUNav: a Cerebrum-Cerebellum architecture for fast, safe, and universal zero-shot goal-oriented navigation, which innovatively integrates vision-language models (VLMs) with the proposed architecture. The cerebellum module, a high-frequency end-to-end module, develops a universal local planner based on deep reinforcement learning, enabling unified navigation across heterogeneous platforms (e.g., humanoid, quadruped, wheeled robots) to improve navigation efficiency while significantly reducing collision risk. The cerebrum module constructs a three-layer reasoning model and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
