ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise, Getoor, Xin Eric Wang

TL;DR
ESC introduces a zero-shot object navigation approach that leverages pre-trained vision, language, and commonsense models to enable embodied agents to navigate to objects without prior training in specific environments.
Contribution
The paper presents a novel zero-shot object navigation method that transfers commonsense knowledge into navigation actions using soft logic predicates, eliminating the need for environment-specific training.
Findings
Significant improvement over baselines on MP3D, HM3D, and RoboTHOR benchmarks.
Achieves 288% relative Success Rate improvement over CoW on MP3D.
Sets new state-of-the-art results for zero-shot object navigation.
Abstract
The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
