Capturing and Inferring Dense Full-Body Human-Scene Contact
Chun-Hao P. Huang, Hongwei Yi, Markus H\"oschle, Matvey Safroshkin,, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, Michael J. Black

TL;DR
This paper introduces RICH, a comprehensive dataset for human-scene contact, and proposes BSTRO, a transformer-based model that accurately predicts dense 3D human-scene contact from a single image, advancing understanding of human-environment interactions.
Contribution
The paper presents a new dataset RICH with detailed contact labels and a novel transformer-based model BSTRO for dense 3D human-scene contact prediction from single images.
Findings
BSTRO outperforms previous methods in contact prediction accuracy.
RICH dataset provides high-resolution multiview data with detailed contact annotations.
The approach effectively captures occluded contact regions using non-local relationships.
Abstract
Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for "Real scenes, Interaction, Contact and Humans." RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
