Face Anything: 4D Face Reconstruction from Any Image Sequence
Umut Kocasari, Simon Giebenhain, Richard Shaw, Matthias Nie{\ss}ner

TL;DR
This paper introduces a transformer-based method for 4D facial reconstruction from image sequences, achieving high accuracy and temporal stability by predicting depth and canonical facial points in a unified framework.
Contribution
The authors propose a novel canonical facial point prediction approach that transforms dense tracking into a canonical reconstruction problem, enabling real-time, high-fidelity 4D face reconstruction.
Findings
Achieves approximately 3× lower correspondence error than previous methods.
Improves depth accuracy by 16%.
Faster inference compared to prior dynamic reconstruction techniques.
Abstract
Accurate reconstruction and tracking of dynamic human faces from image sequences is challenging because non-rigid deformations, expression changes, and viewpoint variations occur simultaneously, creating significant ambiguity in geometry and correspondence estimation. We present a unified method for high-fidelity 4D facial reconstruction based on canonical facial point prediction, a representation that assigns each pixel a normalized facial coordinate in a shared canonical space. This formulation transforms dense tracking and dynamic reconstruction into a canonical reconstruction problem, enabling temporally consistent geometry and reliable correspondences within a single feed-forward model. By jointly predicting depth and canonical coordinates, our method enables accurate depth estimation, temporally stable reconstruction, dense 3D geometry, and robust facial point tracking within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
