Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation

Pingrui Zhang; Yifei Su; Pengyuan Wu; Dong An; Li Zhang; Zhigang Wang; Dong Wang; Yan Ding; Bin Zhao; Xuelong Li

arXiv:2505.20897·cs.CV·June 24, 2025

Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation

Pingrui Zhang, Yifei Su, Pengyuan Wu, Dong An, Li Zhang, Zhigang Wang, Dong Wang, Yan Ding, Bin Zhao, Xuelong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Adaptive Text Dreamer, a novel vision-and-language navigation approach that uses a dual-branch language model architecture inspired by human brain hemispheres to improve reasoning and scene imagination, achieving state-of-the-art results.

Contribution

The paper presents a new dual-branch LLM-based architecture with a cross-interaction mechanism for VLN, enhancing reasoning and imagination efficiency over existing vision-based methods.

Findings

01

Achieves state-of-the-art performance on R2R benchmark.

02

Uses fewer parameters than previous models.

03

Demonstrates effective logical reasoning and scene imagination.

Abstract

Vision-and-Language Navigation (VLN) requires the agent to navigate by following natural instructions under partial observability, making it difficult to align perception with language. Recent methods mitigate this by imagining future scenes, yet they rely on vision-based synthesis, leading to high computational cost and redundant details. To this end, we propose to adaptively imagine key environmental semantics via \textit{language} form, enabling a more reliable and efficient strategy. Specifically, we introduce a novel Adaptive Text Dreamer (ATD), a dual-branch self-guided imagination policy built upon a large language model (LLM). ATD is designed with a human-like left-right brain architecture, where the left brain focuses on logical integration, and the right brain is responsible for imaginative prediction of future scenes. To achieve this, we fine-tune only the Q-former within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangpingrui/adaptive-text-dreamer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems

MethodsALIGN