UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang,, Jenq-Neng Hwang

TL;DR
UniHPE introduces a unified framework that aligns multiple human pose estimation modalities using contrastive learning, significantly improving accuracy in 2D and 3D pose estimation tasks.
Contribution
The paper presents a novel unified pipeline and a singular value based contrastive loss to align 2D, lifting-based, and image-based 3D human pose modalities simultaneously.
Findings
Achieves MPJPE of 50.5mm on Human3.6M
Achieves PAMPJPE of 51.6mm on 3DPW
Demonstrates improved multi-modal pose estimation performance.
Abstract
In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
MethodsALIGN · Contrastive Learning
