DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors
Thomas Hanwen Zhu, Ruining Li, Tomas Jakab

TL;DR
DreamHOI is a zero-shot method that synthesizes realistic 3D human-object interactions from text descriptions by combining diffusion models with a dual implicit-explicit mesh representation.
Contribution
It introduces a novel dual implicit-explicit representation and a gradient optimization technique for realistic 3D HOI generation without extensive datasets.
Findings
Effective zero-shot synthesis of HOIs from text
Realistic 3D interactions generated with high fidelity
Outperforms existing methods in quality and diversity
Abstract
We present DreamHOI, a novel method for zero-shot synthesis of human-object interactions (HOIs), enabling a 3D human model to realistically interact with any given object based on a textual description. This task is complicated by the varying categories and geometries of real-world objects and the scarcity of datasets encompassing diverse HOIs. To circumvent the need for extensive data, we leverage text-to-image diffusion models trained on billions of image-caption pairs. We optimize the articulation of a skinned human mesh using Score Distillation Sampling (SDS) gradients obtained from these models, which predict image-space edits. However, directly backpropagating image-space gradients into complex articulation parameters is ineffective due to the local nature of such gradients. To overcome this, we introduce a dual implicit-explicit representation of a skinned mesh, combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsDiffusion
