Direct Multi-view Multi-person 3D Pose Estimation

Tao Wang; Jianfeng Zhang; Yujun Cai; Shuicheng Yan; Jiashi Feng

arXiv:2111.04076·cs.CV·November 30, 2021·83 cites

Direct Multi-view Multi-person 3D Pose Estimation

Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng

PDF

Open Access 2 Repos 1 Video

TL;DR

The paper introduces MvP, a novel multi-view transformer-based method for direct, efficient, and accurate multi-person 3D pose estimation from images, outperforming previous approaches on key benchmarks.

Contribution

It proposes a transformer-based framework with hierarchical query embeddings, projective attention, and RayConv, enabling direct 3D pose regression without intermediate steps.

Findings

01

Achieves 92.3% AP25 on Panoptic dataset, surpassing previous methods.

02

Outperforms state-of-the-art accuracy while being more efficient.

03

Extensible to human mesh recovery with SMPL model.

Abstract

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Direct Multi-view Multi-person 3D Pose Estimation· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Diabetic Foot Ulcer Assessment and Management