Learning semantical dynamics and spatiotemporal collaboration for human   pose estimation in video

Runyang Feng; Haoming Chen

arXiv:2502.10616·cs.CV·February 18, 2025

Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video

Runyang Feng, Haoming Chen

PDF

Open Access

TL;DR

This paper introduces a novel multi-level semantic and spatio-temporal framework for human pose estimation in videos, overcoming limitations of pixel-level methods by capturing semantic correlations and enhancing feature collaboration.

Contribution

It proposes a Multi-Level Semantic Motion Encoder and a Spatial-Motion Mutual Learning module to better model semantic dynamics and feature integration in video pose estimation.

Findings

01

Achieves state-of-the-art results on PoseTrack datasets.

02

Effectively models semantic relationships across frames.

03

Enhances robustness to occlusions and image quality issues.

Abstract

Temporal modeling and spatio-temporal collaboration are pivotal techniques for video-based human pose estimation. Most state-of-the-art methods adopt optical flow or temporal difference, learning local visual content correspondence across frames at the pixel level, to capture motion dynamics. However, such a paradigm essentially relies on localized pixel-to-pixel similarity, which neglects the semantical correlations among frames and is vulnerable to image quality degradations (e.g. occlusions or blur). Moreover, existing approaches often combine motion and spatial (appearance) features via simple concatenation or summation, leading to practical challenges in fully leveraging these distinct modalities. In this paper, we present a novel framework that learns multi-level semantical dynamics and dense spatio-temporal collaboration for multi-frame human pose estimation. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications

MethodsADaptive gradient method with the OPTimal convergence rate