Towards Viewpoint Invariant 3D Human Pose Estimation
Albert Haque, Boya Peng, Zelun Luo, Alexandre Alahi, Serena Yeung, Li, Fei-Fei

TL;DR
This paper introduces a viewpoint invariant model for 3D human pose estimation from depth images, capable of handling occlusion and noise, and performs well across diverse viewpoints.
Contribution
It presents a novel multi-task learning framework with a convolutional-recurrent architecture and error feedback for viewpoint invariant 3D pose estimation.
Findings
Achieves state-of-the-art performance on non-frontal viewpoints.
Performs competitively on frontal views.
Effectively handles occlusion and noise in depth images.
Abstract
We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our approach leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. We evaluate our model on a previously published depth dataset and a newly collected human pose dataset containing 100K annotated depth images from extreme viewpoints. Experiments show that our model achieves competitive performance on frontal views while achieving state-of-the-art performance on alternate viewpoints.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
