Extracting Zero-shot Common Sense from Large Language Models for Robot   3D Scene Understanding

William Chen; Siyi Hu; Rajat Talak; Luca Carlone

arXiv:2206.04585·cs.RO·June 22, 2022·1 cites

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding

William Chen, Siyi Hu, Rajat Talak, Luca Carlone

PDF

Open Access

TL;DR

This paper presents a zero-shot method leveraging large language models to enhance semantic 3D scene understanding in robotics, enabling labeling of rooms and objects without task-specific training.

Contribution

It introduces a novel zero-shot approach that uses language models for scene labeling, generalizing to unseen objects and room types without prior task-specific data.

Findings

01

Effective zero-shot labeling of rooms and objects

02

No need for task-specific pre-training

03

Generalizes to unseen labels

Abstract

Semantic 3D scene understanding is a problem of critical importance in robotics. While significant advances have been made in simultaneous localization and mapping algorithms, robots are still far from having the common sense knowledge about household objects and their locations of an average human. We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms given the objects contained within. This algorithm has the added benefits of (i) requiring no task-specific pre-training (operating entirely in the zero-shot regime) and (ii) generalizing to arbitrary room and object labels, including previously-unseen ones -- both of which are highly desirable traits in robotic scene understanding algorithms. The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems, and we hope it will pave the way to more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning