Multi-Modal 3D Scene Graph Updater for Shared and Dynamic Environments

Emilio Olivastri; Jonathan Francis; Alberto Pretto; Niko S\"underhauf; and Krishan Rana

arXiv:2411.02938·cs.RO·November 6, 2024

Multi-Modal 3D Scene Graph Updater for Shared and Dynamic Environments

Emilio Olivastri, Jonathan Francis, Alberto Pretto, Niko S\"underhauf, and Krishan Rana

PDF

Open Access

TL;DR

This paper introduces a framework for real-time, multimodal updating of 3D scene graphs in dynamic environments, enhancing robotic understanding and interaction in changing spaces.

Contribution

It presents a novel multimodal approach for updating 3D scene graphs in real-time, addressing the challenge of dynamic environments for robotic applications.

Findings

01

Initial results show improved scene graph consistency during changes.

02

The framework effectively integrates multimodal inputs for scene updates.

03

Outlines future research directions for dynamic environment understanding.

Abstract

The advent of generalist Large Language Models (LLMs) and Large Vision Models (VLMs) have streamlined the construction of semantically enriched maps that can enable robots to ground high-level reasoning and planning into their representations. One of the most widely used semantic map formats is the 3D Scene Graph, which captures both metric (low-level) and semantic (high-level) information. However, these maps often assume a static world, while real environments, like homes and offices, are dynamic. Even small changes in these spaces can significantly impact task performance. To integrate robots into dynamic environments, they must detect changes and update the scene graph in real-time. This update process is inherently multimodal, requiring input from various sources, such as human agents, the robot's own perception system, time, and its actions. This work proposes a framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization · Advanced Vision and Imaging