FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang

TL;DR
FD2Talk introduces a facial decoupled diffusion model for talking head generation, effectively separating motion and appearance to improve quality, diversity, and accuracy over previous methods.
Contribution
The paper proposes a novel multi-stage diffusion framework that decouples facial motion and appearance, enhancing generation quality and detail preservation in talking head synthesis.
Findings
Outperforms previous state-of-the-art methods in quality and diversity.
Accurately predicts facial motion from audio using Diffusion Transformer.
Effectively encodes appearance to guide realistic frame generation.
Abstract
Talking head generation is a significant research topic that still faces numerous challenges. Previous works often adopt generative adversarial networks or regression models, which are plagued by generation quality and average facial shape problem. Although diffusion models show impressive generative ability, their exploration in talking head generation remains unsatisfactory. This is because they either solely use the diffusion model to obtain an intermediate representation and then employ another pre-trained renderer, or they overlook the feature decoupling of complex facial details, such as expressions, head poses and appearance textures. Therefore, we propose a Facial Decoupled Diffusion model for Talking head generation called FD2Talk, which fully leverages the advantages of diffusion models and decouples the complex facial details through multi-stages. Specifically, we separate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Social Robot Interaction and HRI · Speech and Audio Processing
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax
