Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants
Jeongeun Park, Taerim Yoon, Jejoon Hong, Youngjae Yu, Matthew Pan, and, Sungjoon Choi

TL;DR
This paper introduces AVSW, a flexible system enabling robots to locate objects described in free-form language by planning search paths based on semantic understanding and commonsense knowledge, validated in simulation and real-world tests.
Contribution
It presents a novel active visual search system that handles free-form language commands and plans efficient search paths using semantic maps and commonsense reasoning.
Findings
Outperforms previous methods in success-weighted path length in simulations.
Demonstrates effective real-world object search with a Pioneer-3AT robot.
Achieves higher success rate and efficiency in both simulated and real environments.
Abstract
In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Active Visual Search in the Wild (AVSW). AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, AVSW considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSemi-Pseudo-Label
