TL;DR
This paper introduces Head2Head, a neural network architecture for facial reenactment that emphasizes facial motion structure and temporal consistency, achieving more realistic transfer of expressions and gaze.
Contribution
The paper presents a novel neural head synthesis method that improves facial reenactment by focusing on facial motion structure and temporal coherence.
Findings
Outperforms state-of-the-art methods in realism and accuracy.
Effectively transfers expressions, pose, and gaze.
Ensures temporal consistency in generated videos.
Abstract
In this paper, we propose a novel machine learning architecture for facial reenactment. In particular, contrary to the model-based approaches or recent frame-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames, we propose a novel method that (a) exploits the special structure of facial motion (paying particular attention to mouth motion) and (b) enforces temporal consistency. We demonstrate that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
