Language-Conditioned Observation Models for Visual Object Search

Thao Nguyen; Vladislav Hrosinkov; Eric Rosen; Stefanie Tellex

arXiv:2309.07276·cs.RO·September 15, 2023

Language-Conditioned Observation Models for Visual Object Search

Thao Nguyen, Vladislav Hrosinkov, Eric Rosen, Stefanie Tellex

PDF

Open Access

TL;DR

This paper introduces a language-conditioned observation model for robotic visual object search, enabling robots to understand complex language descriptions and adapt their detection strategies dynamically, improving search success rates.

Contribution

The work presents a novel neural network-based observation model that conditions object detection and noise modeling on language descriptions, allowing flexible and scalable object search.

Findings

01

Significantly improved task completion rate from 0.46 to 0.66 in simulation.

02

Demonstrated successful real-world deployment on a Boston Dynamics Spot robot.

03

Outperformed fixed-noise models in efficiency and speed of object search.

Abstract

Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques