Flexible Geometric Guidance for Probabilistic Human Pose Estimation with Diffusion Models
Francis Snelgar, Ming Xu, Stephen Gould, Liang Zheng, Akshay Asthana

TL;DR
This paper introduces a diffusion model-based framework for probabilistic 3D human pose estimation from 2D images, enabling sampling of multiple plausible poses and demonstrating state-of-the-art results without requiring paired 2D-3D training data.
Contribution
It proposes a novel guidance framework using diffusion models for pose estimation, allowing flexible sampling and generalization to new tasks without training dedicated models.
Findings
Achieves state-of-the-art performance without paired 2D-3D data.
Demonstrates strong generalization on unseen datasets.
Enables pose generation and completion without additional training.
Abstract
3D human pose estimation from 2D images is a challenging problem due to depth ambiguity and occlusion. Because of these challenges the task is underdetermined, where there exists multiple -- possibly infinite -- poses that are plausible given the image. Despite this, many prior works assume the existence of a deterministic mapping and estimate a single pose given an image. Furthermore, methods based on machine learning require a large amount of paired 2D-3D data to train and suffer from generalization issues to unseen scenarios. To address both of these issues, we propose a framework for pose estimation using diffusion models, which enables sampling from a probability distribution over plausible poses which are consistent with a 2D image. Our approach falls under the guidance framework for conditional generation, and guides samples from an unconditional diffusion model, trained only on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation
