AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation

Yulu Wu; Jiujun Cheng; Haowen Wang; Dengyang Suo; Pei Ren; Qichao Mao; Shangce Gao; Yakun Huang

arXiv:2512.18396·cs.RO·March 16, 2026

AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation

Yulu Wu, Jiujun Cheng, Haowen Wang, Dengyang Suo, Pei Ren, Qichao Mao, Shangce Gao, Yakun Huang

PDF

Open Access

TL;DR

AOMGen is a scalable framework that generates photorealistic, physics-consistent demonstration data for articulated object manipulation from minimal real data, significantly improving policy success rates.

Contribution

It introduces a novel data synthesis method that creates diverse, high-quality training data from a single real scan, enhancing manipulation policy training.

Findings

01

Success rate increased from 0% to 88.7% after fine-tuning.

02

Generated data enables policies to generalize to unseen objects.

03

Framework produces multi-view, annotated, and varied demonstration data.

Abstract

Recent advances in Vision-Language-Action (VLA) and world-model methods have improved generalization in tasks such as robotic manipulation and object interaction. However, Successful execution of such tasks depends on large, costly collections of real demonstrations, especially for fine-grained manipulation of articulated objects. To address this, we present AOMGen, a scalable data generation framework for articulated manipulation which is instantiated from a single real scan, demonstration and a library of readily available digital assets, yielding photoreal training data with verified physical states. The framework synthesizes synchronized multi-view RGB temporally aligned with action commands and state annotations for joints and contacts, and systematically varies camera viewpoints, object styles, and object poses to expand a single execution into a diverse corpus. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis