PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

Fabien Baradel; Romain Br\'egier; Thibault Groueix; Philippe; Weinzaepfel; Yannis Kalantidis; Gr\'egory Rogez

arXiv:2208.10211·cs.CV·October 20, 2022·1 cites

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

Fabien Baradel, Romain Br\'egier, Thibault Groueix, Philippe, Weinzaepfel, Yannis Kalantidis, Gr\'egory Rogez

PDF

Open Access 1 Repo

TL;DR

PoseBERT is a versatile transformer module trained on 3D MoCap data that can enhance various human pose estimation models in videos by leveraging temporal information without task-specific finetuning.

Contribution

It introduces PoseBERT, a generic, task-agnostic transformer module that improves video-based human pose modeling by utilizing 3D MoCap data and can be integrated with existing models.

Findings

01

Consistently improves state-of-the-art pose estimation methods

02

Enables real-time animation of robotic hands

03

Versatile application across pose refinement, prediction, and motion completion

Abstract

Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver/posebert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging

MethodsTest