SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

Alessandro Simoni; Riccardo Catalini; Davide Di Nucci; Guido Borghi; Davide Davoli; Lorenzo Garattoni; Gianpiero Francesca; Yuki Kawana; Roberto Vezzani

arXiv:2604.26620·cs.CV·April 30, 2026

SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

Alessandro Simoni, Riccardo Catalini, Davide Di Nucci, Guido Borghi, Davide Davoli, Lorenzo Garattoni, Gianpiero Francesca, Yuki Kawana, Roberto Vezzani

PDF

TL;DR

SnapPose3D introduces a diffusion-based framework for single-frame 2D-to-3D human pose lifting, generating multiple hypotheses to address depth ambiguity and joint uncertainty, achieving state-of-the-art results.

Contribution

It is the first to leverage diffusion models for multi-hypothesis 2D-to-3D pose estimation from single frames, avoiding temporal processing.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively generates multiple plausible 3D pose hypotheses.

03

Improves accuracy by aggregating multiple pose predictions.

Abstract

Depth ambiguity and joint uncertainty are the two main obstacles in obtaining accurate human pose predictions by 2D-to-3D lifting methods proposed in the literature. In particular, these issues are caused by 2D joint locations that can be mapped to multiple 3D positions, inducing multiple possible final poses. Following these considerations, we propose leveraging diffusion-based models generation capability to predict multiple hypotheses and aggregate them in a final accurate pose. Therefore, we introduce SnapPose3D, a pose-lifting framework trained deterministically to denoise 3D poses conditioned on both visual context and 2D pose features. SnapPose3D adopts a probabilistic approach during inference, generating multiple hypotheses through random sampling from a unit Gaussian distribution. Unlike most previous methods that address pose ambiguity by processing temporal sequences,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.