SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei, Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihu Hu

TL;DR
SegTalker introduces a segmentation-based framework for talking face generation that preserves textures, enables local editing, and maintains lip synchronization, improving over existing methods in texture detail and temporal consistency.
Contribution
The paper proposes a novel segmentation-based approach with mask-guided local editing for talking face generation, enhancing texture preservation and editing capabilities.
Findings
Effective preservation of texture details.
High temporal consistency in generated videos.
Superior performance in lip synchronization.
Abstract
Audio-driven talking face generation aims to synthesize video with lip movements synchronized to input audio. However, current generative techniques face challenges in preserving intricate regional textures (skin, teeth). To address the aforementioned challenges, we propose a novel framework called SegTalker to decouple lip movements and image textures by introducing segmentation as intermediate representation. Specifically, given the mask of image employed by a parsing network, we first leverage the speech to drive the mask and generate talking segmentation. Then we disentangle semantic regions of image into style codes using a mask-guided encoder. Ultimately, we inject the previously generated talking segmentation and style codes into a mask-guided StyleGAN to synthesize video frame. In this way, most of textures are fully preserved. Moreover, our approach can inherently achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsHuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Feedforward Network · R1 Regularization · Adaptive Instance Normalization · Convolution · StyleGAN
