DreamPose3D: Hallucinative Diffusion with Prompt Learning for 3D Human Pose Estimation
Jerrin Bright, Yuhao Chen, John S. Zelek

TL;DR
DreamPose3D introduces a diffusion-based framework with prompt learning and hallucination techniques to improve 3D human pose estimation by modeling temporal coherence, structural joint relationships, and high-level motion intent.
Contribution
It proposes a novel diffusion-based approach that incorporates action prompts and a hallucinative decoder for more accurate and robust 3D pose estimation.
Findings
Achieves state-of-the-art results on Human3.6M and MPI-3DHP datasets.
Demonstrates robustness on noisy, ambiguous inputs in a baseball broadcast dataset.
Effectively models temporal coherence and joint relationships in 3D pose prediction.
Abstract
Accurate 3D human pose estimation remains a critical yet unresolved challenge, requiring both temporal coherence across frames and fine-grained modeling of joint relationships. However, most existing methods rely solely on geometric cues and predict each 3D pose independently, which limits their ability to resolve ambiguous motions and generalize to real-world scenarios. Inspired by how humans understand and anticipate motion, we introduce DreamPose3D, a diffusion-based framework that combines action-aware reasoning with temporal imagination for 3D pose estimation. DreamPose3D dynamically conditions the denoising process using task-relevant action prompts extracted from 2D pose sequences, capturing high-level intent. To model the structural relationships between joints effectively, we introduce a representation encoder that incorporates kinematic joint affinity into the attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation
