STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution

Yucheng Jin; Jinyan Chen; Ziyue He; Baojun Han; Furan An

arXiv:2506.16061·cs.CV·June 23, 2025

STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution

Yucheng Jin, Jinyan Chen, Ziyue He, Baojun Han, Furan An

PDF

Open Access

TL;DR

STAR-Pose introduces a novel spatial-temporal super-resolution framework with a Transformer and adaptive fusion, significantly improving low-resolution video human pose estimation efficiency and accuracy.

Contribution

It presents a new adaptive super-resolution method with a specialized Transformer and pose-aware loss for better keypoint localization in low-res videos.

Findings

01

Achieves up to 5.2% mAP improvement at 64x48 resolution.

02

Faster inference by 2.8x to 4.4x compared to cascaded methods.

03

Outperforms existing approaches on multiple datasets.

Abstract

Human pose estimation in low-resolution videos presents a fundamental challenge in computer vision. Conventional methods either assume high-quality inputs or employ computationally expensive cascaded processing, which limits their deployment in resource-constrained environments. We propose STAR-Pose, a spatial-temporal adaptive super-resolution framework specifically designed for video-based human pose estimation. Our method features a novel spatial-temporal Transformer with LeakyReLU-modified linear attention, which efficiently captures long-range temporal dependencies. Moreover, it is complemented by an adaptive fusion module that integrates parallel CNN branch for local texture enhancement. We also design a pose-aware compound loss to achieve task-oriented super-resolution. This loss guides the network to reconstruct structural features that are most beneficial for keypoint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Advanced Image Processing Techniques

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer