PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors
Mohamed Adjel (LAAS-GEPETTO), Vincent Bonnet (IPAL, LAAS-GEPETTO, CNRS-AIST JRL)

TL;DR
This paper introduces PriorFormer, a lightweight Transformer model that estimates 3D human poses from monocular images using geometric priors, adaptable to various scenarios with high accuracy and low computational cost.
Contribution
It presents a versatile Transformer-based lifter that incorporates geometric priors and masking to handle missing data, outperforming expert models in 3D human pose estimation.
Findings
Outperforms state-of-the-art by 0.5cm in accuracy
Operates in 380μs on GPU and 1800μs on CPU
Maintains high accuracy with missing priors
Abstract
This paper proposes a new lightweight Transformer-based lifter that maps short sequences of human 2D joint positions to 3D poses using a single camera. The proposed model takes as input geometric priors including segment lengths and camera intrinsics and is designed to operate in both calibrated and uncalibrated settings. To this end, a masking mechanism enables the model to ignore missing priors during training and inference. This yields a single versatile network that can adapt to different deployment scenarios, from fully calibrated lab environments to in-the-wild monocular videos without calibration. The model was trained using 3D keypoints from AMASS dataset with corresponding 2D synthetic data generated by sampling random camera poses and intrinsics. It was then compared to an expert model trained, only on complete priors, and the validation was done by conducting an ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
