Self-Attentive 3D Human Pose and Shape Estimation from Videos
Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang

TL;DR
This paper introduces a video-based approach for 3D human pose and shape estimation that leverages self-attention to incorporate temporal dependencies, resulting in more consistent and accurate predictions across frames.
Contribution
It proposes a novel self-attention module for temporal modeling and a motion forecasting component, improving 3D human pose estimation from videos over previous frame-based methods.
Findings
Outperforms state-of-the-art on 3DPW, MPI-INF-3DHP, Human3.6M datasets.
Achieves more temporally coherent and accurate 3D pose estimations.
Demonstrates the effectiveness of self-attention in modeling temporal dependencies.
Abstract
We consider the task of estimating 3D human pose and shape from videos. While existing frame-based approaches have made significant progress, these methods are independently applied to each image, thereby often leading to inconsistent predictions. In this work, we present a video-based learning algorithm for 3D human pose and shape estimation. The key insights of our method are two-fold. First, to address the inconsistent temporal prediction issue, we exploit temporal information in videos and propose a self-attention module that jointly considers short-range and long-range dependencies across frames, resulting in temporally coherent estimations. Second, we model human motion with a forecasting module that allows the transition between adjacent frames to be smooth. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. Extensive experimental results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
