Scene Synthesis from Human Motion
Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe, Xu, Jiajun Wu

TL;DR
This paper introduces SUMMON, a framework that synthesizes realistic and diverse scenes from human motion data by predicting contact points and optimizing object placement, reducing the need for costly scene capture.
Contribution
The paper presents ContactFormer, a novel contact predictor, and a scene synthesis method that leverages human motion to generate plausible scene layouts with minimal supervision.
Findings
SUMMON produces feasible and diverse scene layouts.
ContactFormer accurately predicts human-scene contact points.
The method can generate extensive human-scene interaction data.
Abstract
Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
