LARM: A Large Articulated-Object Reconstruction Model
Sylvia Yuan, Ruoxi Shi, Xinyue Wei, Xiaoshuai Zhang, Hao Su, Minghua Liu

TL;DR
LARM is a unified feedforward framework that reconstructs detailed, textured 3D articulated objects from sparse images, advancing accuracy and scalability over prior methods by jointly reasoning over geometry, textures, and joint structures.
Contribution
LARM extends a static view synthesis approach to articulated objects, enabling joint reasoning over camera pose and articulation with a transformer-based architecture.
Findings
Outperforms state-of-the-art in view synthesis and 3D reconstruction.
Produces high-quality meshes closely matching input images.
Supports high-fidelity reconstruction across diverse categories.
Abstract
Modeling 3D articulated objects with realistic geometry, textures, and kinematics is essential for a wide range of applications. However, existing optimization-based reconstruction methods often require dense multi-view inputs and expensive per-instance optimization, limiting their scalability. Recent feedforward approaches offer faster alternatives but frequently produce coarse geometry, lack texture reconstruction, and rely on brittle, complex multi-stage pipelines. We introduce LARM, a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images by jointly recovering detailed geometry, realistic textures, and accurate joint structures. LARM extends LVSM a recent novel view synthesis (NVS) approach for static 3D objects into the articulated setting by jointly reasoning over camera pose and articulation variation using a transformer-based architecture,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
