TL;DR
This paper introduces a cost-effective, privacy-safe vision-based method for perceiving humans in 3D from a single image, useful for applications like social distancing and interaction detection in transportation systems.
Contribution
A novel neural network architecture that predicts human 3D locations and orientations with uncertainty from monocular images, without ground plane estimation or expensive sensors.
Findings
Accurately locates humans in 3D using a single camera.
Detects social interactions and safety compliance.
Works with fixed or moving cameras without ground plane estimation.
Abstract
Perceiving humans in the context of Intelligent Transportation Systems (ITS) often relies on multiple cameras or expensive LiDAR sensors. In this work, we present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image. We address the challenges related to the ill-posed monocular 3D tasks by proposing a neural network architecture that predicts confidence intervals in contrast to point estimates. Our neural network estimates human 3D body locations and their orientation with a measure of uncertainty. Our proposed solution (i) is privacy-safe, (ii) works with any fixed or moving cameras, and (iii) does not rely on ground plane estimation. We demonstrate the performance of our method with respect to three applications: locating humans in 3D, detecting social interactions, and verifying the compliance of recent safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
