NOLO: Navigate Only Look Once

Bohan Zhou; Zhongbin Zhang; Jiangxing Wang; and Zongqing Lu

arXiv:2408.01384·cs.CV·November 19, 2024

NOLO: Navigate Only Look Once

Bohan Zhou, Zhongbin Zhang, Jiangxing Wang, and Zongqing Lu

PDF

Open Access

TL;DR

NOLO introduces a novel offline video-based navigation method enabling in-context learning and adaptation to new scenes without environment access or re-training, demonstrating superior performance in diverse settings.

Contribution

The paper presents NOLO, a new approach that learns navigation policies from videos using pseudo action labels and offline reinforcement learning, without environment interaction or fine-tuning.

Findings

01

Outperforms baseline methods significantly in simulation and real-world tests.

02

Demonstrates effective in-context learning and scene adaptation.

03

Utilizes optical flow for pseudo action labeling in egocentric videos.

Abstract

The in-context learning ability of Transformer models has brought new possibilities to visual navigation. In this paper, we focus on the video navigation setting, where an in-context navigation policy needs to be learned purely from videos in an offline manner, without access to the actual environment. For this setting, we propose Navigate Only Look Once (NOLO), a method for learning a navigation policy that possesses the in-context ability and adapts to new scenes by taking corresponding context videos as input without finetuning or re-training. To enable learning from videos, we first propose a pseudo action labeling procedure using optical flow to recover the action label from egocentric videos. Then, offline reinforcement learning is applied to learn the navigation policy. Through extensive experiments on different scenes both in simulation and the real world, we show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Vision and Imaging · Visual Attention and Saliency Detection

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections