HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation
Yajie Fu, Chaorui Huang, Junwei Li, Hui Kong, Yibin Tian, Huakang Li,, Zhiyuan Zhang

TL;DR
HDiffTG is a lightweight hybrid model combining Transformer, GCN, and diffusion techniques to improve 3D human pose estimation accuracy, robustness, and efficiency, especially under occlusions and complex scenarios.
Contribution
It introduces a novel integrated framework that leverages global, local, and optimization techniques for enhanced 3D pose estimation performance.
Findings
Achieves state-of-the-art results on MPI-INF-3DHP dataset.
Demonstrates robustness in noisy and occluded environments.
Maintains computational efficiency with lightweight optimizations.
Abstract
We propose HDiffTG, a novel 3D Human Pose Estimation (3DHPE) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework. HDiffTG leverages the strengths of these techniques to significantly improve pose estimation accuracy and robustness while maintaining a lightweight design. The Transformer captures global spatiotemporal dependencies, the GCN models local skeletal structures, and the diffusion model provides step-by-step optimization for fine-tuning, achieving a complementary balance between global and local features. This integration enhances the model's ability to handle pose estimation under occlusions and in complex scenarios. Furthermore, we introduce lightweight optimizations to the integrated model and refine the objective function design to reduce computational overhead without compromising performance. Evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Human Motion and Animation
MethodsLinear Layer · Multi-Head Attention · Dense Connections · Adam · Attention Is All You Need · Dropout · Diffusion · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding
