Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video
Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao

TL;DR
This paper introduces MPS-Net, a novel network that captures non-local temporal dependencies in monocular videos to improve 3D human pose and shape estimation, outperforming existing methods with fewer parameters.
Contribution
The paper proposes the MoCA and HAFI modules within MPS-Net, enabling better modeling of human motion continuity and temporal correlation for more accurate 3D human pose and shape estimation from monocular video.
Findings
Outperforms state-of-the-art on 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
Uses fewer network parameters than previous methods.
Achieves more accurate and temporally coherent 3D human pose and shape estimation.
Abstract
Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence to better capture the motion continuity dependencies. Then, we develop a hierarchical attentive feature integration (HAFI) module to effectively combine adjacent past and future feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
