HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for   3D Human Pose Estimation

Yajie Fu; Chaorui Huang; Junwei Li; Hui Kong; Yibin Tian; Huakang Li,; Zhiyuan Zhang

arXiv:2505.04276·cs.CV·May 8, 2025

HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation

Yajie Fu, Chaorui Huang, Junwei Li, Hui Kong, Yibin Tian, Huakang Li,, Zhiyuan Zhang

PDF

Open Access 1 Repo

TL;DR

HDiffTG is a lightweight hybrid model combining Transformer, GCN, and diffusion techniques to improve 3D human pose estimation accuracy, robustness, and efficiency, especially under occlusions and complex scenarios.

Contribution

It introduces a novel integrated framework that leverages global, local, and optimization techniques for enhanced 3D pose estimation performance.

Findings

01

Achieves state-of-the-art results on MPI-INF-3DHP dataset.

02

Demonstrates robustness in noisy and occluded environments.

03

Maintains computational efficiency with lightweight optimizations.

Abstract

We propose HDiffTG, a novel 3D Human Pose Estimation (3DHPE) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework. HDiffTG leverages the strengths of these techniques to significantly improve pose estimation accuracy and robustness while maintaining a lightweight design. The Transformer captures global spatiotemporal dependencies, the GCN models local skeletal structures, and the diffusion model provides step-by-step optimization for fine-tuning, achieving a complementary balance between global and local features. This integration enhances the model's ability to handle pose estimation under occlusions and in complex scenarios. Furthermore, we introduce lightweight optimizations to the integrated model and refine the objective function design to reduce computational overhead without compromising performance. Evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

circejie/hdifftg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Human Motion and Animation

MethodsLinear Layer · Multi-Head Attention · Dense Connections · Adam · Attention Is All You Need · Dropout · Diffusion · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding