Active Test-time Vision-Language Navigation

Heeju Ko; Sungjune Kim; Gyeongrok Oh; Jeongyoon Yoon; Honglak Lee; Sujin Jang; Seungryong Kim; Sangpil Kim

arXiv:2506.06630·cs.RO·June 10, 2025

Active Test-time Vision-Language Navigation

Heeju Ko, Sungjune Kim, Gyeongrok Oh, Jeongyoon Yoon, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim

PDF

Open Access 1 Video

TL;DR

This paper introduces ATENA, a test-time active learning framework for vision-language navigation that incorporates human feedback and mixture entropy optimization to improve navigation accuracy in unfamiliar environments.

Contribution

ATENA is the first to combine active learning, mixture entropy optimization, and human-in-the-loop feedback for test-time vision-language navigation.

Findings

01

ATENA outperforms baseline methods on REVERIE, R2R, and R2R-CE benchmarks.

02

It effectively handles distributional shifts during test time.

03

The framework improves uncertainty calibration and decision-making in navigation tasks.

Abstract

Vision-Language Navigation (VLN) policies trained on offline datasets often exhibit degraded task performance when deployed in unfamiliar navigation environments at test time, where agents are typically evaluated without access to external interaction or feedback. Entropy minimization has emerged as a practical solution for reducing prediction uncertainty at test time; however, it can suffer from accumulated errors, as agents may become overconfident in incorrect actions without sufficient contextual grounding. To tackle these challenges, we introduce ATENA (Active TEst-time Navigation Agent), a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes. In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration. Here, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Active Test-time Vision-Language Navigation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning