Coordinate Transformer: Achieving Single-stage Multi-person Mesh   Recovery from Videos

Haoyuan Li; Haoye Dong; Hanchao Jia; Dong Huang; Michael C.; Kampffmeyer; Liang Lin; Xiaodan Liang

arXiv:2308.10334·cs.CV·August 22, 2023

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

Haoyuan Li, Haoye Dong, Hanchao Jia, Dong Huang, Michael C., Kampffmeyer, Liang Lin, Xiaodan Liang

PDF

Open Access

TL;DR

This paper introduces CoordFormer, a novel end-to-end model that directly captures multi-person spatial-temporal relations for 3D mesh recovery from videos, significantly improving accuracy and speed over previous methods.

Contribution

The paper proposes CoordFormer with Coordinate-Aware Attention and Body Center Attention mechanisms for improved multi-person mesh recovery in videos, addressing limitations of multi-stage approaches.

Findings

01

Outperforms state-of-the-art by 4.2%, 8.8%, and 4.7% on MPJPE, PAMPJPE, and PVE metrics.

02

Achieves 40% faster processing than recent video-based methods.

03

Effectively models inter-person interactions and temporal dynamics in a single end-to-end framework.

Abstract

Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond. However, existing approaches rely on multi-stage paradigms, where the person detection and tracking stages are performed in a multi-person setting, while temporal dynamics are only modeled for one person at a time. Consequently, their performance is severely limited by the lack of inter-person interactions in the spatial-temporal mesh recovery, as well as by detection and tracking defects. To address these challenges, we propose the Coordinate transFormer (CoordFormer) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner. Instead of partitioning the feature map into coarse-scale patch-wise tokens, CoordFormer leverages a novel Coordinate-Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging