Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation
Chenxin Xu, Siheng Chen, Maosen Li, Ya Zhang

TL;DR
This paper introduces an unsupervised 3D human pose estimation method using a teacher-student framework with cycle-consistency and invariance properties, achieving state-of-the-art results without 3D annotations.
Contribution
It proposes a novel unsupervised approach combining pose-dictionary regularization, cycle-consistent architectures, and graph convolution networks for improved 3D human pose estimation.
Findings
Reduces 3D joint prediction error by 11.4% on Human3.6M
Outperforms many weakly-supervised methods
Demonstrates effectiveness on multiple datasets
Abstract
We propose a novel method based on teacher-student learning framework for 3D human pose estimation without any 3D annotation or side information. To solve this unsupervised-learning problem, the teacher network adopts pose-dictionary-based modeling for regularization to estimate a physically plausible 3D pose. To handle the decomposition ambiguity in the teacher network, we propose a cycle-consistent architecture promoting a 3D rotation-invariant property to train the teacher network. To further improve the estimation accuracy, the student network adopts a novel graph convolution network for flexibility to directly estimate the 3D coordinates. Another cycle-consistent architecture promoting 3D rotation-equivariant property is adopted to exploit geometry consistency, together with knowledge distillation from the teacher network to improve the pose estimation performance. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
MethodsKnowledge Distillation · Convolution
