Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video
Runyang Feng, Haoming Chen

TL;DR
This paper introduces a novel multi-level semantic and spatio-temporal framework for human pose estimation in videos, overcoming limitations of pixel-level methods by capturing semantic correlations and enhancing feature collaboration.
Contribution
It proposes a Multi-Level Semantic Motion Encoder and a Spatial-Motion Mutual Learning module to better model semantic dynamics and feature integration in video pose estimation.
Findings
Achieves state-of-the-art results on PoseTrack datasets.
Effectively models semantic relationships across frames.
Enhances robustness to occlusions and image quality issues.
Abstract
Temporal modeling and spatio-temporal collaboration are pivotal techniques for video-based human pose estimation. Most state-of-the-art methods adopt optical flow or temporal difference, learning local visual content correspondence across frames at the pixel level, to capture motion dynamics. However, such a paradigm essentially relies on localized pixel-to-pixel similarity, which neglects the semantical correlations among frames and is vulnerable to image quality degradations (e.g. occlusions or blur). Moreover, existing approaches often combine motion and spatial (appearance) features via simple concatenation or summation, leading to practical challenges in fully leveraging these distinct modalities. In this paper, we present a novel framework that learns multi-level semantical dynamics and dense spatio-temporal collaboration for multi-frame human pose estimation. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
MethodsADaptive gradient method with the OPTimal convergence rate
