Flexible Geometric Guidance for Probabilistic Human Pose Estimation with Diffusion Models

Francis Snelgar; Ming Xu; Stephen Gould; Liang Zheng; Akshay Asthana

arXiv:2602.03126·cs.CV·February 4, 2026

Flexible Geometric Guidance for Probabilistic Human Pose Estimation with Diffusion Models

Francis Snelgar, Ming Xu, Stephen Gould, Liang Zheng, Akshay Asthana

PDF

Open Access

TL;DR

This paper introduces a diffusion model-based framework for probabilistic 3D human pose estimation from 2D images, enabling sampling of multiple plausible poses and demonstrating state-of-the-art results without requiring paired 2D-3D training data.

Contribution

It proposes a novel guidance framework using diffusion models for pose estimation, allowing flexible sampling and generalization to new tasks without training dedicated models.

Findings

01

Achieves state-of-the-art performance without paired 2D-3D data.

02

Demonstrates strong generalization on unseen datasets.

03

Enables pose generation and completion without additional training.

Abstract

3D human pose estimation from 2D images is a challenging problem due to depth ambiguity and occlusion. Because of these challenges the task is underdetermined, where there exists multiple -- possibly infinite -- poses that are plausible given the image. Despite this, many prior works assume the existence of a deterministic mapping and estimate a single pose given an image. Furthermore, methods based on machine learning require a large amount of paired 2D-3D data to train and suffer from generalization issues to unseen scenarios. To address both of these issues, we propose a framework for pose estimation using diffusion models, which enables sampling from a probability distribution over plausible poses which are consistent with a 2D image. Our approach falls under the guidance framework for conditional generation, and guides samples from an unconditional diffusion model, trained only on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation