FMPose3D: monocular 3D pose estimation via flow matching
Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis

TL;DR
FMPose3D introduces an efficient flow matching-based framework for monocular 3D pose estimation, generating multiple plausible hypotheses with fewer inference steps and achieving state-of-the-art results on human and animal pose benchmarks.
Contribution
It formulates 3D pose estimation as a conditional distribution transport problem using ODE-based flow matching, enabling fast and diverse hypothesis generation from 2D inputs.
Findings
Surpasses existing methods on Human3.6M and MPI-INF-3DHP benchmarks.
Achieves state-of-the-art performance on Animal3D and CtrlAni3D datasets.
Efficiently generates multiple pose hypotheses with fewer inference steps.
Abstract
Monocular 3D pose estimation is fundamentally ill-posed due to depth ambiguity and occlusions, thereby motivating probabilistic methods that generate multiple plausible 3D pose hypotheses. In particular, diffusion-based models have recently demonstrated strong performance, but their iterative denoising process typically requires many timesteps for each prediction, making inference computationally expensive. In contrast, we leverage Flow Matching (FM) to learn a velocity field defined by an Ordinary Differential Equation (ODE), enabling efficient generation of 3D pose samples with only a few integration steps. We propose a novel generative pose estimation framework, FMPose3D, that formulates 3D pose estimation as a conditional distribution transport problem. It continuously transports samples from a standard Gaussian prior to the distribution of plausible 3D poses conditioned only on 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
