Perceiving 3D Human-Object Spatial Arrangements from a Single Image in   the Wild

Jason Y. Zhang; Sam Pepose; Hanbyul Joo; Deva Ramanan and; Jitendra Malik; Angjoo Kanazawa

arXiv:2007.15649·cs.CV·August 21, 2020·6 cites

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan and, Jitendra Malik, Angjoo Kanazawa

PDF

Open Access 2 Repos

TL;DR

This paper introduces a method to infer 3D human-object arrangements from a single in-the-wild image without scene-level 3D supervision, leveraging joint reasoning and novel constraints to resolve ambiguities.

Contribution

It proposes a novel approach that jointly models humans and objects to infer 3D arrangements using constraints like scale, occlusion, and interaction, without requiring scene-level 3D data.

Findings

01

Significantly reduces ambiguity in 3D configurations

02

Effective on challenging in-the-wild images with various objects

03

Outperforms baseline methods in human-object spatial reasoning

Abstract

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to "3D common sense" constraints that can be used to resolve ambiguity. In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction loss to capture the spatial layout of objects with which humans interact. We empirically validate that our constraints dramatically reduce the space of likely 3D spatial configurations. We demonstrate our approach on challenging, in-the-wild images of humans…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Vision and Imaging