NOLO: Navigate Only Look Once
Bohan Zhou, Zhongbin Zhang, Jiangxing Wang, and Zongqing Lu

TL;DR
NOLO introduces a novel offline video-based navigation method enabling in-context learning and adaptation to new scenes without environment access or re-training, demonstrating superior performance in diverse settings.
Contribution
The paper presents NOLO, a new approach that learns navigation policies from videos using pseudo action labels and offline reinforcement learning, without environment interaction or fine-tuning.
Findings
Outperforms baseline methods significantly in simulation and real-world tests.
Demonstrates effective in-context learning and scene adaptation.
Utilizes optical flow for pseudo action labeling in egocentric videos.
Abstract
The in-context learning ability of Transformer models has brought new possibilities to visual navigation. In this paper, we focus on the video navigation setting, where an in-context navigation policy needs to be learned purely from videos in an offline manner, without access to the actual environment. For this setting, we propose Navigate Only Look Once (NOLO), a method for learning a navigation policy that possesses the in-context ability and adapts to new scenes by taking corresponding context videos as input without finetuning or re-training. To enable learning from videos, we first propose a pseudo action labeling procedure using optical flow to recover the action label from egocentric videos. Then, offline reinforcement learning is applied to learn the navigation policy. Through extensive experiments on different scenes both in simulation and the real world, we show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Vision and Imaging · Visual Attention and Saliency Detection
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
