V-VIPE: Variational View Invariant Pose Embedding
Mara Levy, Abhinav Shrivastava

TL;DR
V-VIPE introduces a variational autoencoder-based embedding that captures 3D human poses in a view-invariant canonical space, enabling improved pose comparison, retrieval, and generation across different camera views.
Contribution
The paper proposes V-VIPE, a novel view-invariant 3D pose embedding using a variational autoencoder, facilitating diverse applications like pose retrieval, classification, and unseen pose generation.
Findings
Embeds 3D poses in a canonical space for view-invariance.
Enables accurate 3D pose estimation from 2D images.
Supports generation of unseen 3D poses.
Abstract
Learning to represent three dimensional (3D) human pose given a two dimensional (2D) image of a person, is a challenging problem. In order to make the problem less ambiguous it has become common practice to estimate 3D pose in the camera coordinate space. However, this makes the task of comparing two 3D poses difficult. In this paper, we address this challenge by separating the problem of estimating 3D pose from 2D images into two steps. We use a variational autoencoder (VAE) to find an embedding that represents 3D poses in canonical coordinate space. We refer to this embedding as variational view-invariant pose embedding V-VIPE. Using V-VIPE we can encode 2D and 3D poses and use the embedding for downstream tasks, like retrieval and classification. We can estimate 3D poses from these embeddings using the decoder as well as generate unseen 3D poses. The variability of our encoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Mechanisms and Dynamics · Human Pose and Action Recognition
