TL;DR
This paper introduces a real-time multi-view 3D human pose estimation system using distributed smart sensors with semantic feedback, improving accuracy by integrating global context into local detections.
Contribution
It proposes a novel semantic feedback loop between backend and sensors, enhancing local 2D joint detection and 3D pose estimation in a distributed setup.
Findings
Achieves state-of-the-art results on public datasets.
Feedback improves 2D joint detection accuracy.
System operates in real-time for multi-person scenarios.
Abstract
We present a novel method for estimation of 3D human poses from a multi-camera setup, employing distributed smart edge sensors coupled with a backend through a semantic feedback loop. 2D joint detection for each camera view is performed locally on a dedicated embedded inference processor. Only the semantic skeleton representation is transmitted over the network and raw images remain on the sensor board. 3D poses are recovered from 2D joints on a central backend, based on triangulation and a body model which incorporates prior knowledge of the human skeleton. A feedback channel from backend to individual sensors is implemented on a semantic level. The allocentric 3D pose is backprojected into the sensor views where it is fused with 2D joint detections. The local semantic model on each sensor can thus be improved by incorporating global context information. The whole pipeline is capable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
