Flow Matching for Probabilistic Monocular 3D Human Pose Estimation

Cuong Le; Pavl\'o Melnyk; Bastian Wandt; M{\aa}rten Wadenb\"ack

arXiv:2601.16763·cs.CV·January 26, 2026

Flow Matching for Probabilistic Monocular 3D Human Pose Estimation

Cuong Le, Pavl\'o Melnyk, Bastian Wandt, M{\aa}rten Wadenb\"ack

PDF

Open Access

TL;DR

This paper introduces FMPose, a probabilistic method for monocular 3D human pose estimation using flow matching and optimal transport, achieving faster and more accurate results than diffusion-based approaches.

Contribution

FMPose is the first to apply flow matching generative modeling with optimal transport for 3D human pose estimation from monocular images.

Findings

01

Major improvements over state-of-the-art on Human3.6M, MPI-INF-3DHP, and 3DPW datasets.

02

FMPose produces faster and more accurate 3D pose estimations than diffusion-based methods.

03

Probabilistic modeling effectively captures pose uncertainty and ambiguity.

Abstract

Recovering 3D human poses from a monocular camera view is a highly ill-posed problem due to the depth ambiguity. Earlier studies on 3D human pose lifting from 2D often contain incorrect-yet-overconfident 3D estimations. To mitigate the problem, emerging probabilistic approaches treat the 3D estimations as a distribution, taking into account the uncertainty measurement of the poses. Falling in a similar category, we proposed FMPose, a probabilistic 3D human pose estimation method based on the flow matching generative approach. Conditioned on the 2D cues, the flow matching scheme learns the optimal transport from a simple source distribution to the plausible 3D human pose distribution via continuous normalizing flows. The 2D lifting condition is modeled via graph convolutional networks, leveraging the learnable connections between human body joints as the graph structure for feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning