Self-Attentive 3D Human Pose and Shape Estimation from Videos

Yun-Chun Chen; Marco Piccirilli; Robinson Piramuthu; Ming-Hsuan Yang

arXiv:2103.14182·cs.CV·September 8, 2021

Self-Attentive 3D Human Pose and Shape Estimation from Videos

Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang

PDF

Open Access

TL;DR

This paper introduces a video-based approach for 3D human pose and shape estimation that leverages self-attention to incorporate temporal dependencies, resulting in more consistent and accurate predictions across frames.

Contribution

It proposes a novel self-attention module for temporal modeling and a motion forecasting component, improving 3D human pose estimation from videos over previous frame-based methods.

Findings

01

Outperforms state-of-the-art on 3DPW, MPI-INF-3DHP, Human3.6M datasets.

02

Achieves more temporally coherent and accurate 3D pose estimations.

03

Demonstrates the effectiveness of self-attention in modeling temporal dependencies.

Abstract

We consider the task of estimating 3D human pose and shape from videos. While existing frame-based approaches have made significant progress, these methods are independently applied to each image, thereby often leading to inconsistent predictions. In this work, we present a video-based learning algorithm for 3D human pose and shape estimation. The key insights of our method are two-fold. First, to address the inconsistent temporal prediction issue, we exploit temporal information in videos and propose a self-attention module that jointly considers short-range and long-range dependencies across frames, resulting in temporally coherent estimations. Second, we model human motion with a forecasting module that allows the transition between adjacent frames to be smooth. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. Extensive experimental results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging