A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion
Junkun Jiang, Jie Chen, Yike Guo

TL;DR
This paper introduces a dual-masked auto-encoder framework utilizing transformer-based skeletal motion modeling to improve robustness in multi-person motion capture, especially under severe occlusion and data loss.
Contribution
It proposes an adaptive, identity-aware triangulation and a novel Dual-Masked Auto-Encoder for complete 3D skeletal motion reconstruction under challenging conditions.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effective in scenarios with severe occlusion and missing data.
Introduces a new high-accuracy multi-person motion capture dataset.
Abstract
Multi-person motion capture can be challenging due to ambiguities caused by severe occlusion, fast body movement, and complex interactions. Existing frameworks build on 2D pose estimations and triangulate to 3D coordinates via reasoning the appearance, trajectory, and geometric consistencies among multi-camera observations. However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. First, we propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. To generate complete 3D skeletal motion, we then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization
