PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors

Mohamed Adjel (LAAS-GEPETTO); Vincent Bonnet (IPAL; LAAS-GEPETTO; CNRS-AIST JRL)

arXiv:2508.18238·cs.CV·August 26, 2025

PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors

Mohamed Adjel (LAAS-GEPETTO), Vincent Bonnet (IPAL, LAAS-GEPETTO, CNRS-AIST JRL)

PDF

TL;DR

This paper introduces PriorFormer, a lightweight Transformer model that estimates 3D human poses from monocular images using geometric priors, adaptable to various scenarios with high accuracy and low computational cost.

Contribution

It presents a versatile Transformer-based lifter that incorporates geometric priors and masking to handle missing data, outperforming expert models in 3D human pose estimation.

Findings

01

Outperforms state-of-the-art by 0.5cm in accuracy

02

Operates in 380μs on GPU and 1800μs on CPU

03

Maintains high accuracy with missing priors

Abstract

This paper proposes a new lightweight Transformer-based lifter that maps short sequences of human 2D joint positions to 3D poses using a single camera. The proposed model takes as input geometric priors including segment lengths and camera intrinsics and is designed to operate in both calibrated and uncalibrated settings. To this end, a masking mechanism enables the model to ignore missing priors during training and inference. This yields a single versatile network that can adapt to different deployment scenarios, from fully calibrated lab environments to in-the-wild monocular videos without calibration. The model was trained using 3D keypoints from AMASS dataset with corresponding 2D synthetic data generated by sampling random camera poses and intrinsics. It was then compared to an expert model trained, only on complete priors, and the validation was done by conducting an ablation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.