Articulated 3D Scene Graphs for Open-World Mobile Manipulation

Martin B\"uchner; Adrian R\"ofer; Tim Engelbracht; Tim Welschehold; Zuria Bauer; Hermann Blum; Marc Pollefeys; Abhinav Valada

arXiv:2602.16356·cs.RO·February 19, 2026

Articulated 3D Scene Graphs for Open-World Mobile Manipulation

Martin B\"uchner, Adrian R\"ofer, Tim Engelbracht, Tim Welschehold, Zuria Bauer, Hermann Blum, Marc Pollefeys, Abhinav Valada

PDF

Open Access

TL;DR

This paper introduces MoMa-SG, a framework for creating semantic-kinematic 3D scene graphs from RGB-D data, enabling robots to understand and manipulate articulated objects in complex environments.

Contribution

The paper presents a novel unified twist estimation method for modeling object articulation and introduces the Arti4D-Semantic dataset for articulated scene understanding.

Findings

01

MoMa-SG accurately infers object kinematics from RGB-D sequences.

02

The approach enables robust manipulation of articulated objects in real-world settings.

03

Extensive evaluation shows high performance on multiple datasets.

Abstract

Semantics has enabled 3D scene understanding and affordance-driven object interaction. However, robots operating in real-world environments face a critical limitation: they cannot anticipate how objects move. Long-horizon mobile manipulation requires closing the gap between semantics, geometry, and kinematics. In this work, we present MoMa-SG, a novel framework for building semantic-kinematic 3D scene graphs of articulated scenes containing a myriad of interactable objects. Given RGB-D sequences containing multiple object articulations, we temporally segment object interactions and infer object motion using occlusion-robust point tracking. We then lift point trajectories into 3D and estimate articulation models using a novel unified twist estimation formulation that robustly estimates revolute and prismatic joint parameters in a single optimization pass. Next, we associate objects with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition