PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation
Mingju Gao, Kaisen Yang, Huan-ang Gao, Bohan Li, Ao Ding, Wenyi Li, Yangcheng Yu, Jinkun Liu, Shaocong Xu, Yike Niu, Haohan Chi, Hao Chen, Hao Tang, Yu Zhang, Li Yi, Hao Zhao

TL;DR
PAM is a unified engine that synthesizes controllable human-object interaction videos by integrating pose, appearance, and motion, advancing the capabilities of sim-to-real transfer in embodied AI.
Contribution
We introduce PAM, a comprehensive framework that combines pose, appearance, and motion for realistic HOI video generation, addressing fragmentation in existing methods.
Findings
PAM outperforms existing methods on DexYCB and OAKINK2 datasets in FVD and MPJPE metrics.
Synthetic videos generated by PAM improve hand pose estimation accuracy.
Combining multiple input conditions yields the best video generation quality.
Abstract
Hand-object interaction (HOI) reconstruction and synthesis are becoming central to embodied AI and AR/VR. Yet, despite rapid progress, existing HOI generation research remains fragmented across three disjoint tracks: (1) pose-only synthesis that predicts MANO trajectories without producing pixels; (2) single-image HOI generation that hallucinates appearance from masks or 2D cues but lacks dynamics; and (3) video generation methods that require both the entire pose sequence and the ground-truth first frame as inputs, preventing true sim-to-real deployment. Inspired by the philosophy of Joo et al. (2018), we think that HOI generation requires a unified engine that brings together pose, appearance, and motion within one coherent framework. Thus we introduce PAM: a Pose-Appearance-Motion Engine for controllable HOI video generation. The performance of our engine is validated by: (1) On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
