Active Test-time Vision-Language Navigation
Heeju Ko, Sungjune Kim, Gyeongrok Oh, Jeongyoon Yoon, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim

TL;DR
This paper introduces ATENA, a test-time active learning framework for vision-language navigation that incorporates human feedback and mixture entropy optimization to improve navigation accuracy in unfamiliar environments.
Contribution
ATENA is the first to combine active learning, mixture entropy optimization, and human-in-the-loop feedback for test-time vision-language navigation.
Findings
ATENA outperforms baseline methods on REVERIE, R2R, and R2R-CE benchmarks.
It effectively handles distributional shifts during test time.
The framework improves uncertainty calibration and decision-making in navigation tasks.
Abstract
Vision-Language Navigation (VLN) policies trained on offline datasets often exhibit degraded task performance when deployed in unfamiliar navigation environments at test time, where agents are typically evaluated without access to external interaction or feedback. Entropy minimization has emerged as a practical solution for reducing prediction uncertainty at test time; however, it can suffer from accumulated errors, as agents may become overconfident in incorrect actions without sufficient contextual grounding. To tackle these challenges, we introduce ATENA (Active TEst-time Navigation Agent), a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes. In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration. Here, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
