Capturing and Inferring Dense Full-Body Human-Scene Contact

Chun-Hao P. Huang; Hongwei Yi; Markus H\"oschle; Matvey Safroshkin,; Tsvetelina Alexiadis; Senya Polikovsky; Daniel Scharstein; Michael J. Black

arXiv:2206.09553·cs.CV·June 22, 2022·6 cites

Capturing and Inferring Dense Full-Body Human-Scene Contact

Chun-Hao P. Huang, Hongwei Yi, Markus H\"oschle, Matvey Safroshkin,, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, Michael J. Black

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces RICH, a comprehensive dataset for human-scene contact, and proposes BSTRO, a transformer-based model that accurately predicts dense 3D human-scene contact from a single image, advancing understanding of human-environment interactions.

Contribution

The paper presents a new dataset RICH with detailed contact labels and a novel transformer-based model BSTRO for dense 3D human-scene contact prediction from single images.

Findings

01

BSTRO outperforms previous methods in contact prediction accuracy.

02

RICH dataset provides high-resolution multiview data with detailed contact annotations.

03

The approach effectively captures occluded contact regions using non-local relationships.

Abstract

Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for "Real scenes, Interaction, Contact and Humans." RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
recursivelabsai/model-evaluation-infrastructure
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems