D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement
Danqi Yan, Qing Gao, Yuepeng Qian, Xinxing Chen, Chenglong Fu, and, Yuquan Leng

TL;DR
This paper introduces D3PRefiner, a diffusion-based neural network method that refines 3D human poses estimated from monocular images, significantly reducing noise and improving accuracy.
Contribution
It proposes a novel diffusion model approach for refining 3D human poses, leveraging a Gaussian distribution conditioned on 2D and noisy 3D data, enhancing existing estimators.
Findings
Reduces MPJPE by at least 10.3%
Reduces P-MPJPE by at least 11.0%
Improves accuracy across different models and input sequences
Abstract
Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise. By observing the histogram of this noise, we find each dimension of the noise follows a certain distribution, which indicates the possibility for a neural network to learn the mapping between noisy poses and ground truth poses. In this work, in order to obtain more accurate 3D poses, a Diffusion-based 3D Pose Refiner (D3PRefiner) is proposed to refine the output of any existing 3D pose estimator. We first introduce a conditional multivariate Gaussian distribution to model the distribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Hand Gesture Recognition Systems
MethodsProcrustes · Diffusion
