Learning camera viewpoint using CNN to improve 3D body pose estimation
Mona Fathollahi Ghezelghieh, Rangachar Kasturi, Sudeep Sarkar

TL;DR
This paper demonstrates that incorporating camera viewpoint information into CNN-based models significantly enhances 3D human pose estimation accuracy from single RGB images, without relying on explicit perspective geometry models.
Contribution
The study introduces a novel approach of learning camera viewpoint with CNNs to improve 3D pose estimation, utilizing synthetic data for robustness.
Findings
Achieved up to 20% error reduction on Human3.6m benchmark.
Camera viewpoint significantly improves 3D pose accuracy.
Synthetic training data enhances model robustness.
Abstract
The objective of this work is to estimate 3D human pose from a single RGB image. Extracting image representations which incorporate both spatial relation of body parts and their relative depth plays an essential role in accurate3D pose reconstruction. In this paper, for the first time, we show that camera viewpoint in combination to 2D joint lo-cations significantly improves 3D pose accuracy without the explicit use of perspective geometry mathematical models.To this end, we train a deep Convolutional Neural Net-work (CNN) to learn categorical camera viewpoint. To make the network robust against clothing and body shape of the subject in the image, we utilized 3D computer rendering to synthesize additional training images. We test our framework on the largest 3D pose estimation bench-mark, Human3.6m, and achieve up to 20% error reduction compared to the state-of-the-art approaches that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
