VIBE: Video Inference for Human Body Pose and Shape Estimation

Muhammed Kocabas; Nikos Athanasiou; and Michael J. Black

arXiv:1912.05656·cs.CV·May 1, 2020

VIBE: Video Inference for Human Body Pose and Shape Estimation

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black

PDF

5 Repos 1 Video

TL;DR

VIBE introduces an adversarial learning framework that leverages large-scale motion capture data and in-the-wild 2D keypoints to produce accurate, natural, and kinematically plausible 3D human motion sequences from videos.

Contribution

It presents a novel adversarial training approach for video-based 3D human pose and shape estimation without requiring ground-truth 3D motion data in the wild.

Findings

01

Achieves state-of-the-art performance on challenging datasets.

02

Produces natural and kinematically plausible motion sequences.

03

Effectively leverages large-scale mocap data with unpaired 2D keypoints.

Abstract

Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

VIBE: Video Inference for Human Body Pose and Shape Estimation· youtube

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Average Pooling · Batch Normalization · Residual Connection · Dogecoin Customer Service Number +1-833-534-1729 · Gated Recurrent Unit · Max Pooling · Global Average Pooling · Bottleneck Residual Block