# Efficient Diffusion-Based 3D Human Pose Estimation with Hierarchical Temporal Pruning

**Authors:** Yuquan Bi, Hongsong Wang, Xinli Shi, Zhipeng Gui, Jie Gui, Yuan Yan Tang

arXiv: 2508.21363 · 2026-03-10

## TL;DR

This paper introduces an efficient diffusion-based framework for 3D human pose estimation that employs hierarchical temporal pruning to significantly reduce computational cost while maintaining state-of-the-art accuracy.

## Contribution

It proposes a novel hierarchical temporal pruning strategy that dynamically prunes redundant pose tokens at multiple levels, improving efficiency without sacrificing performance.

## Key findings

- Reduces training MACs by 38.5%
- Decreases inference MACs by 56.8%
- Speeds up inference by 81.1% on benchmark datasets

## Abstract

Diffusion models have demonstrated strong capabilities in generating high-fidelity 3D human poses, yet their iterative nature and multi-hypothesis requirements incur substantial computational cost. In this paper, we propose an Efficient Diffusion-Based 3D Human Pose Estimation framework with a Hierarchical Temporal Pruning (HTP) strategy, which dynamically prunes redundant pose tokens across both frame and semantic levels while preserving critical motion dynamics. HTP operates in a staged, top-down manner: (1) Temporal Correlation-Enhanced Pruning (TCEP) identifies essential frames by analyzing inter-frame motion correlations through adaptive temporal graph construction; (2) Sparse-Focused Temporal MHSA (SFT MHSA) leverages the resulting frame-level sparsity to reduce attention computation, focusing on motion-relevant tokens; and (3) Mask-Guided Pose Token Pruner (MGPTP) performs fine-grained semantic pruning via clustering, retaining only the most informative pose tokens. Experiments on Human3.6M and MPI-INF-3DHP show that HTP reduces training MACs by 38.5\%, inference MACs by 56.8\%, and improves inference speed by an average of 81.1\% compared to prior diffusion-based methods, while achieving state-of-the-art performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21363/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21363/full.md

## References

73 references — full list in the complete paper: https://tomesphere.com/paper/2508.21363/full.md

---
Source: https://tomesphere.com/paper/2508.21363