ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation
Hongjie Li, Hong-Xing Yu, Jiaman Li, Jiajun Wu

TL;DR
ZeroHSI is a novel zero-shot method for synthesizing 4D human-scene interactions in unseen environments by leveraging video generation models and differentiable rendering, without requiring any motion capture data.
Contribution
ZeroHSI introduces a zero-shot approach to generate realistic human-scene interactions without training on MoCap data, using distillation from video generation models and differentiable rendering.
Findings
Successfully synthesizes interactions in static and dynamic scenes.
Generates diverse, contextually appropriate human motions.
Operates without ground-truth motion data.
Abstract
Human-scene interaction (HSI) generation is crucial for applications in embodied AI, virtual reality, and robotics. Yet, existing methods cannot synthesize interactions in unseen environments such as in-the-wild scenes or reconstructed scenes, as they rely on paired 3D scenes and captured human motion data for training, which are unavailable for unseen environments. We present ZeroHSI, a novel approach that enables zero-shot 4D human-scene interaction synthesis, eliminating the need for training on any MoCap data. Our key insight is to distill human-scene interactions from state-of-the-art video generation models, which have been trained on vast amounts of natural human movements and interactions, and use differentiable rendering to reconstruct human-scene interactions. ZeroHSI can synthesize realistic human motions in both static scenes and environments with dynamic objects, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
