(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation
Xinwei Yu, Xiaohua Zhang

TL;DR
Fusionformer is a novel transformer-based approach for 3D human pose estimation that effectively models joint motion trajectories and fuses global and local features, improving accuracy on benchmark datasets.
Contribution
The paper introduces Fusionformer, which incorporates self-trajectory and mutual-trajectory modules to better capture joint motion dynamics in 3D pose estimation.
Findings
Achieves 2.4% lower MPJPE on Human3.6M
Achieves 4.3% lower P-MPJPE on Human3.6M
Outperforms baseline Poseformer method
Abstract
For the current 3D human pose estimation task, a group of methods mainly learn the rules of 2D-3D projection from spatial and temporal correlation. However, earlier methods model the global features of the entire body joint in the time domain, but ignore the motion trajectory of individual joint. The recent work [29] considers that there are differences in motion between different joints and deals with the temporal relationship of each joint separately. However, we found that different joints show the same movement trends under some specific actions. Therefore, our proposed Fusionformer method introduces a self-trajectory module and a mutual-trajectory module based on the spatio-temporal module .After that, the global spatio-temporal features and local joint trajectory features are fused through a linear network in a parallel manner. To eliminate the influence of bad 2D poses on 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
