ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

Arjun Majumdar; Gunjan Aggarwal; Bhavika Devnani; Judy Hoffman; Dhruv; Batra

arXiv:2206.12403·cs.CV·October 16, 2023·41 cites

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv, Batra

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ZSON, a zero-shot object-goal navigation method that leverages multimodal goal embeddings trained on image-goal tasks, enabling agents to find objects in open-world environments using natural language instructions without prior rewards or demonstrations.

Contribution

The paper proposes a novel zero-shot approach for object-goal navigation using multimodal semantic embeddings trained on image-goal data, allowing natural language goal specification in open environments.

Findings

01

Achieves 4.2% to 20.0% success improvement over existing zero-shot methods.

02

Enables agents to follow complex and compound natural language instructions.

03

Generalizes well across multiple datasets and real-world scenarios.

Abstract

We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g., "find a sink"). Our approach is entirely zero-shot -- i.e., it does not require ObjectNav rewards or demonstrations of any kind. Instead, we train on the image-goal navigation (ImageNav) task, in which agents find the location where a picture (i.e., goal image) was captured. Specifically, we encode goal images into a multimodal, semantic embedding space to enable training semantic-goal navigation (SemanticNav) agents at scale in unannotated 3D environments (e.g., HM3D). After training, SemanticNav agents can be instructed to find objects described in free-form natural language (e.g., "sink", "bathroom sink", etc.) by projecting language goals into the same multimodal, semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gunagg/zson
pytorchOfficial

Videos

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications