Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal   Structures from Multi-view Joint Cloud

Junkun Jiang; Jie Chen; Ho Yin Au; Mingyuan Chen; Wei Xue; and Yike; Guo

arXiv:2502.02936·cs.CV·February 6, 2025

Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud

Junkun Jiang, Jie Chen, Ho Yin Au, Mingyuan Chen, Wei Xue, and Yike, Guo

PDF

Open Access

TL;DR

This paper introduces a novel transformer-based framework, JCSAT, for multi-view multi-person 3D motion capture that effectively handles occlusions and joint association by leveraging a comprehensive joint cloud and a new selection mechanism.

Contribution

The paper proposes the Joint Cloud Selection and Aggregation Transformer (JCSAT) with OTAP for improved 3D pose estimation from multi-view data, and introduces the BUMocap-X dataset for complex scenarios.

Findings

01

JCSAT outperforms existing methods on benchmark datasets.

02

The framework effectively handles severe occlusions and complex interactions.

03

The new dataset BUMocap-X provides a challenging benchmark for future research.

Abstract

Multi-person motion capture over sparse angular observations is a challenging problem under interference from both self- and mutual-occlusions. Existing works produce accurate 2D joint detection, however, when these are triangulated and lifted into 3D, available solutions all struggle in selecting the most accurate candidates and associating them to the correct joint type and target identity. As such, in order to fully utilize all accurate 2D joint location information, we propose to independently triangulate between all same-typed 2D joints from all camera views regardless of their target ID, forming the Joint Cloud. Joint Cloud consist of both valid joints lifted from the same joint type and target ID, as well as falsely constructed ones that are from different 2D sources. These redundant and inaccurate candidates are processed over the proposed Joint Cloud Selection and Aggregation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Human Pose and Action Recognition · Medical Imaging and Analysis

MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam