TL;DR
This paper introduces a deployable embodied vision-language navigation system that balances high-level reasoning and efficiency, utilizing hierarchical cognition and context-aware exploration for real-world robotic navigation.
Contribution
The authors propose a novel hierarchical VLN system with asynchronous layers and a shared memory, enabling efficient long-horizon reasoning and real-time deployment on resource-limited robots.
Findings
Achieves higher navigation success rates than existing VLN methods.
Maintains real-time performance on resource-constrained hardware.
Demonstrates effectiveness in both simulation and real-world environments.
Abstract
Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-and-language navigation (VLN), existing approaches often face a trade-off between reasoning capability and deployment efficiency on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and strong high-level reasoning on real-world robots. The system is decomposed into a fast perception-action layer and a deep reasoning layer running asynchronously at different time scales, with a shared memory layer enabling efficient interaction between them. To support long-horizon reasoning, we incrementally construct a compact memory graph and progressively feed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
