MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Haitian Li; Haozhe Xie; Junxiang Xu; Beichen Wen; Fangzhou Hong; Ziwei Liu

arXiv:2603.19231·cs.CV·March 20, 2026

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Haitian Li, Haozhe Xie, Junxiang Xu, Beichen Wen, Fangzhou Hong, Ziwei Liu

PDF

Open Access

TL;DR

MonoArt introduces a progressive reasoning framework for monocular 3D reconstruction of articulated objects, achieving stable, accurate, and efficient inference without multi-stage pipelines or external templates.

Contribution

It proposes a unified, progressive structural reasoning approach that transforms visual data into canonical geometry and part representations, improving stability and interpretability.

Findings

01

Achieves state-of-the-art accuracy on PartNet-Mobility

02

Demonstrates fast inference speed

03

Generalizes to robotic manipulation and scene reconstruction

Abstract

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition