LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images
Gregory Rogez, Philippe Weinzaepfel, Cordelia Schmid

TL;DR
LCR-Net++ is an end-to-end architecture that detects multiple 2D and 3D human poses in natural images by generating, scoring, and refining pose proposals without needing prior human localization.
Contribution
It introduces a joint Localization-Classification-Regression framework for simultaneous multi-person 2D and 3D pose estimation in natural images, improving over existing methods.
Findings
Outperforms state-of-the-art in 3D pose estimation on Human3.6M.
Shows promising results on MPII 2D pose benchmark.
Effective in multi-person scenarios with occlusions and truncations.
Abstract
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our Localization-Classification-Regression architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests candidate poses at different locations in the image; 2) a classifier that scores the different pose proposals; and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses, which is shown to improve over a standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
