Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
Pingrui Zhang, Yifei Su, Pengyuan Wu, Dong An, Li Zhang, Zhigang Wang, Dong Wang, Yan Ding, Bin Zhao, Xuelong Li

TL;DR
This paper introduces the Adaptive Text Dreamer, a novel vision-and-language navigation approach that uses a dual-branch language model architecture inspired by human brain hemispheres to improve reasoning and scene imagination, achieving state-of-the-art results.
Contribution
The paper presents a new dual-branch LLM-based architecture with a cross-interaction mechanism for VLN, enhancing reasoning and imagination efficiency over existing vision-based methods.
Findings
Achieves state-of-the-art performance on R2R benchmark.
Uses fewer parameters than previous models.
Demonstrates effective logical reasoning and scene imagination.
Abstract
Vision-and-Language Navigation (VLN) requires the agent to navigate by following natural instructions under partial observability, making it difficult to align perception with language. Recent methods mitigate this by imagining future scenes, yet they rely on vision-based synthesis, leading to high computational cost and redundant details. To this end, we propose to adaptively imagine key environmental semantics via \textit{language} form, enabling a more reliable and efficient strategy. Specifically, we introduce a novel Adaptive Text Dreamer (ATD), a dual-branch self-guided imagination policy built upon a large language model (LLM). ATD is designed with a human-like left-right brain architecture, where the left brain focuses on logical integration, and the right brain is responsible for imaginative prediction of future scenes. To achieve this, we fine-tune only the Q-former within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
MethodsALIGN
