Clio: Real-time Task-Driven Open-Set 3D Scene Graphs

Dominic Maggio; Yun Chang; Nathan Hughes; Matthew Trang; Dan Griffith,; Carlyn Dougherty; Eric Cristofalo; Lukas Schmid; Luca Carlone

arXiv:2404.13696·cs.RO·September 30, 2024·2 cites

Clio: Real-time Task-Driven Open-Set 3D Scene Graphs

Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith,, Carlyn Dougherty, Eric Cristofalo, Lukas Schmid, Luca Carlone

PDF

Open Access 1 Repo

TL;DR

Clio is a real-time system that constructs task-specific 3D scene graphs by clustering environment primitives based on natural language tasks, improving robotic perception and task execution accuracy.

Contribution

This paper introduces a task-driven 3D scene understanding framework using the Information Bottleneck, with a real-time pipeline for hierarchical scene graph construction on robots.

Findings

01

Enables real-time, compact open-set 3D scene graphs

02

Improves task execution accuracy by focusing on relevant concepts

03

Demonstrates effective clustering of 3D primitives into task-relevant objects

Abstract

Modern tools for class-agnostic image segmentation (e.g., SegmentAnything) and open-set semantic understanding (e.g., CLIP) provide unprecedented opportunities for robot perception and mapping. While traditional closed-set metric-semantic maps were restricted to tens or hundreds of semantic classes, we can now build maps with a plethora of objects and countless semantic variations. This leaves us with a fundamental question: what is the right granularity for the objects (and, more generally, for the semantic concepts) the robot has to include in its map representation? While related work implicitly chooses a level of granularity by tuning thresholds for object detection, we argue that such a choice is intrinsically task-dependent. The first contribution of this paper is to propose a task-driven 3D scene understanding problem, where the robot is given a list of tasks in natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-spark/clio
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Human Pose and Action Recognition