Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic Images in Facial Capture Pipelines
Winnie Lin, Yilin Zhu, Demi Guo, Ron Fedkiw

TL;DR
This paper introduces a comprehensive pipeline that uses deepfake technology and synthetic data to improve 3D facial modeling and tracking from in-the-wild videos, effectively bridging the domain gap between real and synthetic images.
Contribution
It presents a novel use of deepfake technology within a synthetic multi-view stereo pipeline to enhance facial capture robustness and reduce reliance on real-world ground truth data.
Findings
Robust 3D facial models from in-the-wild videos
Effective synthetic-to-real domain gap mitigation
No need for high-end calibration or ground truth data
Abstract
We propose an end-to-end pipeline for both building and tracking 3D facial models from personalized in-the-wild (cellphone, webcam, youtube clips, etc.) video data. First, we present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision detection algorithms in traditional computer graphics pipelines. Subsequently, we utilize synthetic turntables and leverage deepfake technology in order to build a synthetic multi-view stereo pipeline for appearance capture that is robust to imperfect synthetic geometry and image misalignment. The resulting model is fit with an animation rig, which is then used to track facial performances. Notably, our novel use of deepfake technology enables us to perform robust tracking of in-the-wild data using differentiable renderers despite a significant synthetic-to-real domain gap. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
