Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation
Krishna Kanth Nakka, Mathieu Salzmann

TL;DR
This paper critically evaluates how well current self-supervised methods disentangle pose and appearance in 3D human pose estimation, revealing significant appearance information remains in pose codes, thus challenging the effectiveness of existing approaches.
Contribution
The paper provides an in-depth analysis of state-of-the-art disentanglement methods, introducing novel tests to assess the true separation of pose and appearance in self-supervised learning.
Findings
Disentanglement in current methods is incomplete.
Pose codes still contain significant appearance information.
Existing frameworks are less robust to appearance changes than expected.
Abstract
As 3D human pose estimation can now be achieved with very high accuracy in the supervised learning scenario, tackling the case where 3D pose annotations are not available has received increasing attention. In particular, several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. The methods then only need a small amount of supervised data to train a pose regressor using the pose-related latent vector as input, as it should be free of appearance information. In this paper, we carry out in-depth analysis to understand to what degree the state-of-the-art disentangled representation learning methods truly separate the appearance information from the pose one. First, we study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments. Second, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning
