DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data

Ruiqi Wu; Xinjie Wang; Liu Liu; Chunle Guo; Jiaxiong Qiu; Chongyi Li; Lichao Huang; Zhizhong Su; Ming-Ming Cheng

arXiv:2505.20460·cs.CV·May 29, 2025

DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data

Ruiqi Wu, Xinjie Wang, Liu Liu, Chunle Guo, Jiaxiong Qiu, Chongyi Li, Lichao Huang, Zhizhong Su, Ming-Ming Cheng

PDF

Open Access

TL;DR

DIPO is a new framework that uses dual images to controllably generate articulated 3D objects, leveraging a diffusion model and graph reasoning to improve accuracy and diversity.

Contribution

The paper introduces DIPO, a dual-image diffusion model with a graph reasoner, and a large-scale dataset PM-X for improved articulated object generation.

Findings

01

DIPO outperforms existing methods in generating articulated objects.

02

The PM-X dataset enhances model generalization to complex objects.

03

Dual-image input provides valuable motion information for better predictions.

Abstract

We present DIPO, a novel framework for the controllable generation of articulated 3D objects from a pair of images: one depicting the object in a resting state and the other in an articulated state. Compared to the single-image approach, our dual-image input imposes only a modest overhead for data collection, but at the same time provides important motion information, which is a reliable guide for predicting kinematic relationships between parts. Specifically, we propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters. In addition, we introduce a Chain-of-Thought (CoT) based graph reasoner that explicitly infers part connectivity relationships. To further improve robustness and generalization on complex articulated objects, we develop a fully automated dataset expansion pipeline, name LEGO-Art, that enriches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Multimodal Machine Learning Applications