Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

Lijun Guo; Haoyu Zhao; Xingyue Zhao; Rong Fu; Linghao Zhuang; Siteng Huang; Zhongyu Li; Hua Zou

arXiv:2603.11606·cs.CV·March 13, 2026

Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

Lijun Guo, Haoyu Zhao, Xingyue Zhao, Rong Fu, Linghao Zhuang, Siteng Huang, Zhongyu Li, Hua Zou

PDF

Open Access

TL;DR

Articulat3D introduces a novel framework for reconstructing high-fidelity digital twins of articulated objects from casual monocular videos by leveraging geometric and motion constraints, enabling scalable and accurate 3D modeling.

Contribution

The paper presents a new method that jointly enforces geometric and motion constraints for 3D reconstruction from monocular videos, including a motion prior-driven initialization and a refinement process with learnable kinematic primitives.

Findings

01

Achieves state-of-the-art results on synthetic benchmarks.

02

Successfully reconstructs articulated objects from casual monocular videos.

03

Demonstrates robustness in real-world uncontrolled conditions.

Abstract

Building high-fidelity digital twins of articulated objects from visual data remains a central challenge. Existing approaches depend on multi-view captures of the object in discrete, static states, which severely constrains their real-world scalability. In this paper, we introduce Articulat3D, a novel framework that constructs such digital twins from casually captured monocular videos by jointly enforcing explicit 3D geometric and motion constraints. We first propose Motion Prior-Driven Initialization, which leverages 3D point tracks to exploit the low-dimensional structure of articulated motion. By modeling scene dynamics with a compact set of motion bases, we facilitate soft decomposition of the scene into multiple rigidly-moving groups. Building on this initialization, we introduce Geometric and Motion Constraints Refinement, which enforces physically plausible articulation through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis