TransFusion: Cross-view Fusion with Transformer for 3D Human Pose   Estimation

Haoyu Ma; Liangjian Chen; Deying Kong; Zhe Wang; Xingwei Liu; Hao; Tang; Xiangyi Yan; Yusheng Xie; Shih-Yao Lin; Xiaohui Xie

arXiv:2110.09554·cs.CV·December 10, 2021·36 cites

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao, Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

PDF

Open Access 1 Repo

TL;DR

TransFusion introduces a transformer-based framework for multi-view 3D human pose estimation that directly enhances 2D pose predictions by integrating cross-view information, leading to improved accuracy and efficiency.

Contribution

The paper proposes TransFusion, a novel transformer architecture with epipolar field encoding for effective multi-view feature fusion in 3D human pose estimation.

Findings

01

Achieves 25.8 mm MPJPE on Human 3.6M dataset.

02

Outperforms existing fusion methods in efficiency and accuracy.

03

Uses only 5 million parameters at 256x256 resolution.

Abstract

Estimating the 2D human poses in each view is typically the first step in calibrated multi-view 3D pose estimation. But the performance of 2D pose detectors suffers from challenging situations such as occlusions and oblique viewing angles. To address these challenges, previous works derive point-to-point correspondences between different views from epipolar geometry and utilize the correspondences to merge prediction heatmaps or feature representations. Instead of post-prediction merge/calibration, here we introduce a transformer framework for multi-view 3D pose estimation, aiming at directly improving individual 2D predictors by integrating information from different views. Inspired by previous multi-modal transformers, we design a unified transformer architecture, named TransFusion, to fuse cues from both current views and neighboring views. Moreover, we propose the concept of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

howiema/transfusion-pose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods