TL;DR
This paper introduces a novel framework for animating 3D talking heads with arbitrary mesh topologies, overcoming fixed topology limitations, and proposes new evaluation metrics for lip-sync quality, demonstrating superior performance.
Contribution
It presents the first method capable of animating 3D faces across arbitrary topologies, including real scans, and introduces improved metrics for lip-sync evaluation.
Findings
Outperforms fixed topology methods in diverse mesh scenarios
Enables training with unregistered, varying mesh structures
Proposes new metrics for more accurate lip-sync assessment
Abstract
Generating speech-driven 3D talking heads presents numerous challenges; among those is dealing with varying mesh topologies where no point-wise correspondence exists across the meshes the model can animate. While previous literature works assume fixed mesh structures, in this work we present the first framework capable of animating 3D faces in arbitrary topologies, including real scanned data. Our approach leverages heat diffusion to predict features that are robust to the mesh topology. We explore two training settings: a registered one, in which meshes in a training sequences share a fixed topology but any mesh can be animated at test time, and an fully unregistered one, which allows effective training with varying mesh structures. Additionally, we highlight the limitations of current evaluation metrics and propose new metrics for better lip-syncing evaluation. An extensive evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
