StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion

Haoxin Yang; Weihong Chen; Xuemiao Xu; Cheng Xu; Peng Xiao; Cuifeng Sun; Shaoyu Huang; Shengfeng He

arXiv:2508.02056·cs.CV·August 12, 2025

StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion

Haoxin Yang, Weihong Chen, Xuemiao Xu, Cheng Xu, Peng Xiao, Cuifeng Sun, Shaoyu Huang, Shengfeng He

PDF

Open Access

TL;DR

StarPose introduces an autoregressive diffusion framework that leverages historical 3D pose predictions and physical guidance to improve accuracy and temporal consistency in monocular 3D human pose estimation.

Contribution

It proposes a novel autoregressive diffusion model with modules for historical pose integration and physical guidance, enhancing 3D pose prediction accuracy and temporal coherence.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Achieves higher accuracy in 3D pose estimation.

03

Demonstrates improved temporal consistency in pose sequences.

Abstract

Monocular 3D human pose estimation remains a challenging task due to inherent depth ambiguities and occlusions. Compared to traditional methods based on Transformers or Convolutional Neural Networks (CNNs), recent diffusion-based approaches have shown superior performance, leveraging their probabilistic nature and high-fidelity generation capabilities. However, these methods often fail to account for the spatial and temporal correlations across predicted frames, resulting in limited temporal consistency and inferior accuracy in predicted 3D pose sequences. To address these shortcomings, this paper proposes StarPose, an autoregressive diffusion framework that effectively incorporates historical 3D pose predictions and spatial-temporal physical guidance to significantly enhance both the accuracy and temporal coherence of pose predictions. Unlike existing approaches, StarPose models the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning