Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose   Estimation

Ziniu Wan; Zhengjia Li; Maoqing Tian; Jianbo Liu; Shuai Yi; Hongsheng; Li

arXiv:2109.02303·cs.CV·September 7, 2021

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng, Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Multi-level Attention Encoder-Decoder Network (MAED) that effectively models spatial, temporal, and joint-level relations to improve 3D human shape and pose estimation, especially in challenging scenarios.

Contribution

The novel MAED framework integrates multi-level attention mechanisms within an encoder-decoder architecture for enhanced 3D human pose estimation.

Findings

01

Outperforms previous methods on 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks.

02

Achieves 6.2, 7.2, and 2.4 mm improvements in PA-MPJPE.

03

Effectively models multi-level relations for accurate pose estimation.

Abstract

3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal level and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or extreme pose. To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework. STE consists of a series of cascaded blocks based on Multi-Head Self-Attention, and each block uses two parallel branches to learn spatial and temporal attention respectively. Meanwhile, KTD aims at modeling the joint level attention. It regards pose estimation as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziniuwan/maed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods