Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis
Dominik Borer, Jakob Buhmann, Martin Guay

TL;DR
This paper introduces an advanced pose representation and a synthetic data training protocol within an analysis-by-synthesis framework, significantly reducing pose flips and improving accuracy in in-the-wild human pose estimation.
Contribution
It proposes a more expressive skeleton representation and a synthetic data training method to enhance pose estimation accuracy and reduce errors.
Findings
Fewer pose flips with the new representation.
Improved accuracy on standard benchmarks.
Outperforms previous analysis-by-synthesis models.
Abstract
Modern pose estimation models are trained on large, manually-labelled datasets which are costly and may not cover the full extent of human poses and appearances in the real world. With advances in neural rendering, analysis-by-synthesis and the ability to not only predict, but also render the pose, is becoming an appealing framework, which could alleviate the need for large scale manual labelling efforts. While recent work have shown the feasibility of this approach, the predictions admit many flips due to a simplistic intermediate skeleton representation, resulting in low precision and inhibiting the acquisition of any downstream knowledge such as three-dimensional positioning. We solve this problem with a more expressive intermediate skeleton representation capable of capturing the semantics of the pose (left and right), which significantly reduces flips. To successfully train this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
