TL;DR
This paper introduces a geometry-aware representation learning method for 3D human pose estimation that leverages multi-view images without annotations, significantly reducing the need for labeled data and outperforming existing methods.
Contribution
It proposes a novel unsupervised approach to learn 3D geometry-aware representations from multi-view images, enhancing semi-supervised 3D pose estimation.
Findings
Outperforms fully-supervised methods with the same labeled data
Improves semi-supervised methods using as little as 1% labeled data
Effectively learns 3D geometry from unannotated multi-view images
Abstract
Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed. In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations. To this end, we use an encoder-decoder that predicts an image from one viewpoint given an image from another viewpoint. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
