UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation
Bu Jin, Weize Li, Songen Gu, Yupeng Zheng, Yuhang Zheng, Zhengyi Zhou, Yao Yao

TL;DR
UniArt is a diffusion-based framework that synthesizes fully articulated 3D objects from a single image, enabling open-set articulation prediction and achieving state-of-the-art quality and accuracy.
Contribution
UniArt introduces a unified latent representation and reversible joint-to-voxel embedding for end-to-end 3D articulated object synthesis from images.
Findings
Achieves state-of-the-art mesh quality on PartNet-Mobility
Demonstrates accurate articulation prediction including unseen categories
Enables open-set generalization to novel joint types
Abstract
Articulated 3D objects play a vital role in realistic simulation and embodied robotics, yet manually constructing such assets remains costly and difficult to scale. In this paper, we present UniArt, a diffusion-based framework that directly synthesizes fully articulated 3D objects from a single image in an end-to-end manner. Unlike prior multi-stage techniques, UniArt establishes a unified latent representation that jointly encodes geometry, texture, part segmentation, and kinematic parameters. We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry, enabling the model to learn coherent motion behaviors alongside structural formation. Furthermore, we formulate articulation type prediction as an open-set problem, removing the need for fixed joint semantics and allowing generalization to novel joint categories and unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis
