iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos

Weikun Peng; Jun Lv; Cewu Lu; Manolis Savva

arXiv:2506.08334·cs.GR·November 18, 2025

iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos

Weikun Peng, Jun Lv, Cewu Lu, Manolis Savva

PDF

1 Datasets

TL;DR

iTACO is a novel framework that creates interactable digital twins of articulated objects from casually captured RGBD videos, enabling scalable and practical digitization for robotics and AI applications.

Contribution

The paper introduces iTACO, a coarse-to-fine method for segmenting and analyzing articulated objects from casual RGBD videos, and provides a large new dataset for evaluation.

Findings

01

iTACO outperforms existing methods on synthetic and real videos.

02

The dataset contains 784 videos of 284 objects, 20 times larger than prior datasets.

03

iTACO effectively handles object and camera motion, occlusions, and casual capture conditions.

Abstract

Articulated objects are prevalent in daily life. Interactable digital twins of such objects have numerous applications in embodied AI and robotics. Unfortunately, current methods to digitize articulated real-world objects require carefully captured data, preventing practical, scalable, and generalizable acquisition. We focus on motion analysis and part-level segmentation of an articulated object from a casually captured RGBD video shot with a hand-held camera. A casually captured video of an interaction with an articulated object is easy to obtain at scale using smartphones. However, this setting is challenging due to simultaneous object and camera motion and significant occlusions as the person interacts with the object. To tackle these challenges, we introduce iTACO: a coarse-to-fine framework that infers joint parameters and segments movable parts of the object from a dynamic RGBD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

3dlg-hcvc/video2articulation
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus