VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction

Vasiliki Vasileiou; Panagiotis P. Filntisis; Petros Maragos; Kostas Daniilidis

arXiv:2604.10106·cs.CV·April 14, 2026

VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction

Vasiliki Vasileiou, Panagiotis P. Filntisis, Petros Maragos, Kostas Daniilidis

PDF

1 Repo

TL;DR

VGGT-HPE introduces a relative head pose estimation method that predicts transformations between configurations, trained solely on synthetic data, achieving state-of-the-art results without real-world training.

Contribution

The paper proposes a novel relative head pose estimation approach using synthetic data, outperforming traditional absolute regression methods and validating the advantages of relative prediction.

Findings

01

Achieves state-of-the-art results on BIWI benchmark.

02

Relative prediction outperforms absolute regression, especially on difficult poses.

03

Zero real-world training data suffices for high accuracy.

Abstract

Monocular head pose estimation is traditionally formulated as direct regression from a single image to an absolute pose. This paradigm forces the network to implicitly internalize a dataset-specific canonical reference frame. In this work, we argue that predicting the relative rigid transformation between two observed head configurations is a fundamentally easier and more robust formulation. We introduce VGGT-HPE, a relative head pose estimator built upon a general-purpose geometry foundation model. Finetuned exclusively on synthetic facial renderings, our method sidesteps the need for an implicit anchor by reducing the problem to estimating a geometric displacement from an explicitly provided anchor with a known pose. As a practical benefit, the relative formulation also allows the anchor to be chosen at test time - for instance, a near-neutral frame or a temporally adjacent one - so…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://vasilikivas.github.io/VGGT-HPE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.