Hand-Object Interaction Image Generation
Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li

TL;DR
This paper introduces HOGAN, a novel framework for generating realistic images of hand-object interactions conditioned on specific hand and object configurations, addressing occlusion and topology challenges.
Contribution
We propose a new model-aware representation and a unified surface space to improve hand-object interaction image generation, explicitly handling occlusion and topology complexities.
Findings
Outperforms existing methods on HO3Dv3 and DexYCB datasets.
Effectively preserves structure and fidelity in generated images.
Demonstrates superiority both quantitatively and qualitatively.
Abstract
In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Human Motion and Animation
