Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang

TL;DR
This paper introduces DDHPose, a hierarchical diffusion-based method for 3D human pose estimation that disentangles pose components and models hierarchical spatial-temporal joint relations, achieving state-of-the-art results.
Contribution
The paper proposes a novel hierarchical diffusion model with disentangled pose components and a hierarchical denoiser to improve 3D human pose estimation accuracy.
Findings
Achieves SOTA performance on benchmark datasets.
Effectively models hierarchical joint relationships.
Reduces hierarchical error propagation.
Abstract
Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D Human Pose Estimation method with Hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Diffusion
