Robot Object Retrieval with Contextual Natural Language Queries
Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick,, Stefanie Tellex

TL;DR
This paper introduces a novel model enabling robots to retrieve objects based on natural language descriptions of their usage, generalizing to unseen classes and unknown nouns, demonstrated on a robot arm with a new dataset.
Contribution
The work presents a model that predicts object appearance from usage descriptions, allowing retrieval without explicit class labels and generalizing to unseen objects and language.
Findings
Achieves 62.3% accuracy on unseen ImageNet classes
Achieves 53.0% accuracy on unseen object classes with unknown nouns
Demonstrates real-time retrieval on a robot arm
Abstract
Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
