Scene-Aware 3D Multi-Human Motion Capture from a Single Camera
Diogo Luvizon, Marc Habermann, Vladislav Golyanik, Adam Kortylewski,, Christian Theobalt

TL;DR
This paper presents a novel, affordable, and easy-to-use 3D multi-human motion capture method from a single RGB video, leveraging pre-trained models and non-linear optimization to estimate scene and human details in real-world conditions.
Contribution
It introduces the first non-linear optimization approach that jointly estimates 3D human poses, shapes, scene depth, and scale from a single camera, outperforming previous methods.
Findings
Outperforms previous methods on established benchmarks.
Robust in challenging in-the-wild scenes with diverse human sizes.
Provides accurate 3D reconstructions from a single RGB video.
Abstract
In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the absolute 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
