SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled   Videos

Yingying Jiao; Zhigang Wang; Sifan Wu; Shaojing Fan; Zhenguang Liu,; Zhuoyue Xu; Zheqi Wu

arXiv:2501.15073·cs.CV·January 28, 2025

SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos

Yingying Jiao, Zhigang Wang, Sifan Wu, Shaojing Fan, Zhenguang Liu,, Zhuoyue Xu, Zheqi Wu

PDF

Open Access 1 Video

TL;DR

STDPose is a new framework that improves human pose estimation in sparsely-labeled videos by capturing long-range motion and spatiotemporal dynamics, achieving high accuracy with limited labeled data.

Contribution

It introduces a Dynamic-Aware Mask and spatiotemporal encoding system, setting new benchmarks in pose estimation with minimal labeled data.

Findings

01

Outperforms existing methods on large-scale datasets

02

Achieves competitive results with only 26.7% labeled data

03

Establishes new benchmarks for pose propagation and estimation

Abstract

Human pose estimation in videos remains a challenge, largely due to the reliance on extensive manual annotation of large datasets, which is expensive and labor-intensive. Furthermore, existing approaches often struggle to capture long-range temporal dependencies and overlook the complementary relationship between temporal pose heatmaps and visual features. To address these limitations, we introduce STDPose, a novel framework that enhances human pose estimation by learning spatiotemporal dynamics in sparsely-labeled videos. STDPose incorporates two key innovations: 1) A novel Dynamic-Aware Mask to capture long-range motion context, allowing for a nuanced understanding of pose changes. 2) A system for encoding and aggregating spatiotemporal representations and motion dynamics to effectively model spatiotemporal relationships, improving the accuracy and robustness of pose estimation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging