SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge

Yumeng He; Ying Jiang; Jiayin Lu; Yin Yang; Chenfanfu Jiang

arXiv:2512.01629·cs.CV·December 3, 2025

SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge

Yumeng He, Ying Jiang, Jiayin Lu, Yin Yang, Chenfanfu Jiang

PDF

Open Access

TL;DR

SPARK is a novel framework that reconstructs detailed, simulation-ready articulated 3D objects from a single RGB image by combining vision-language models, generative diffusion, and differentiable optimization.

Contribution

It introduces a comprehensive pipeline that automates the creation of articulated 3D assets from images, integrating VLMs, diffusion transformers, and differentiable rendering for accurate, ready-to-use models.

Findings

01

Produces high-quality, simulation-ready articulated assets.

02

Effective across diverse object categories.

03

Enables downstream robotics applications.

Abstract

Articulated 3D objects are critical for embodied AI, robotics, and interactive scene understanding, yet creating simulation-ready assets remains labor-intensive and requires expert modeling of part hierarchies and motion structures. We introduce SPARK, a framework for reconstructing physically consistent, kinematic part-level articulated objects from a single RGB image. Given an input image, we first leverage VLMs to extract coarse URDF parameters and generate part-level reference images. We then integrate the part-image guidance and the inferred structure graph into a generative diffusion transformer to synthesize consistent part and complete shapes of articulated objects. To further refine the URDF parameters, we incorporate differentiable forward kinematics and differentiable rendering to optimize joint types, axes, and origins under VLM-generated open-state supervision. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Human Motion and Animation