TL;DR
This paper introduces a 2D-supervised approach for 3D human-object reconstruction in the wild, leveraging a flow-based neural network to learn spatial priors from 2D images, enabling effective reconstruction without 3D supervision.
Contribution
It presents a novel method that learns 3D human-object spatial relations solely from 2D images, improving generalization to real-world scenarios and introducing the WildHOI dataset for benchmarking.
Findings
Achieves comparable performance to fully 3D supervised methods on BEHAVE.
Outperforms previous methods in generality and interaction diversity on in-the-wild images.
Demonstrates effectiveness of 2D priors in 3D reconstruction tasks.
Abstract
Learning the prior knowledge of the 3D human-object spatial relation is crucial for reconstructing human-object interaction from images and understanding how humans interact with objects in 3D space. Previous works learn this prior from datasets collected in controlled environments, but due to the diversity of domains, they struggle to generalize to real-world scenarios. To overcome this limitation, we present a 2D-supervised method that learns the 3D human-object spatial relation prior purely from 2D images in the wild. Our method utilizes a flow-based neural network to learn the prior distribution of the 2D human-object keypoint layout and viewports for each image in the dataset. The effectiveness of the prior learned from 2D images is demonstrated on the human-object reconstruction task by applying the prior to tune the relative pose between the human and the object during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
