FMPose3D: monocular 3D pose estimation via flow matching

Ti Wang; Xiaohang Yu; Mackenzie Weygandt Mathis

arXiv:2602.05755·cs.CV·February 6, 2026

FMPose3D: monocular 3D pose estimation via flow matching

Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis

PDF

Open Access 1 Models

TL;DR

FMPose3D introduces an efficient flow matching-based framework for monocular 3D pose estimation, generating multiple plausible hypotheses with fewer inference steps and achieving state-of-the-art results on human and animal pose benchmarks.

Contribution

It formulates 3D pose estimation as a conditional distribution transport problem using ODE-based flow matching, enabling fast and diverse hypothesis generation from 2D inputs.

Findings

01

Surpasses existing methods on Human3.6M and MPI-INF-3DHP benchmarks.

02

Achieves state-of-the-art performance on Animal3D and CtrlAni3D datasets.

03

Efficiently generates multiple pose hypotheses with fewer inference steps.

Abstract

Monocular 3D pose estimation is fundamentally ill-posed due to depth ambiguity and occlusions, thereby motivating probabilistic methods that generate multiple plausible 3D pose hypotheses. In particular, diffusion-based models have recently demonstrated strong performance, but their iterative denoising process typically requires many timesteps for each prediction, making inference computationally expensive. In contrast, we leverage Flow Matching (FM) to learn a velocity field defined by an Ordinary Differential Equation (ODE), enabling efficient generation of 3D pose samples with only a few integration steps. We propose a novel generative pose estimation framework, FMPose3D, that formulates 3D pose estimation as a conditional distribution transport problem. It continuously transports samples from a standard Gaussian prior to the distribution of plausible 3D poses conditioned only on 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DeepLabCut/FMPose3D
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning