TL;DR
This paper introduces a real-time, marker-less system for multi-person 3D pose estimation using RGB-Depth camera networks, capable of tracking multiple individuals without prior appearance or pose assumptions.
Contribution
It presents a novel multi-view 3D pose estimation system that combines CNN-based 2D pose detection with depth data, operating in real-time and supporting multiple persons without markers.
Findings
Outperforms baseline multi-view approaches in various scenarios
Operates in real-time suitable for interactive applications
Open source implementation available in OpenPTrack
Abstract
This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
