Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face   Reconstruction

Simon Giebenhain; Tobias Kirschstein; Martin R\"unz; Lourdes Agapito,; Matthias Nie{\ss}ner

arXiv:2505.00615·cs.CV·May 2, 2025

Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

Simon Giebenhain, Tobias Kirschstein, Martin R\"unz, Lourdes Agapito,, Matthias Nie{\ss}ner

PDF

TL;DR

Pixel3DMM leverages vision transformers and foundation model features to improve single-image 3D face reconstruction, achieving higher geometric accuracy across diverse expressions and ethnicities.

Contribution

The paper introduces Pixel3DMM, a novel approach combining vision transformers and foundation model features for enhanced 3D face reconstruction from a single image.

Findings

01

Outperforms baselines by over 15% in geometric accuracy.

02

Introduces a new benchmark with diverse expressions and ethnicities.

03

Employs a novel FLAME fitting optimization for 3DMM parameters.

Abstract

We address the 3D reconstruction of human faces from a single RGB image. To this end, we propose Pixel3DMM, a set of highly-generalized vision transformers which predict per-pixel geometric cues in order to constrain the optimization of a 3D morphable face model (3DMM). We exploit the latent features of the DINO foundation model, and introduce a tailored surface normal and uv-coordinate prediction head. We train our model by registering three high-quality 3D face datasets against the FLAME mesh topology, which results in a total of over 1,000 identities and 976K images. For 3D face reconstruction, we propose a FLAME fitting opitmization that solves for the 3DMM parameters from the uv-coordinate and normal estimates. To evaluate our method, we introduce a new benchmark for single-image face reconstruction, which features high diversity facial expressions, viewing angles, and ethnicities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Layer Normalization · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels · Sparse Evolutionary Training