SegTalker: Segmentation-based Talking Face Generation with Mask-guided   Local Editing

Lingyu Xiong; Xize Cheng; Jintao Tan; Xianjia Wu; Xiandong Li; Lei; Zhu; Fei Ma; Minglei Li; Huang Xu; Zhihu Hu

arXiv:2409.03605·cs.CV·September 6, 2024

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei, Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihu Hu

PDF

Open Access

TL;DR

SegTalker introduces a segmentation-based framework for talking face generation that preserves textures, enables local editing, and maintains lip synchronization, improving over existing methods in texture detail and temporal consistency.

Contribution

The paper proposes a novel segmentation-based approach with mask-guided local editing for talking face generation, enhancing texture preservation and editing capabilities.

Findings

01

Effective preservation of texture details.

02

High temporal consistency in generated videos.

03

Superior performance in lip synchronization.

Abstract

Audio-driven talking face generation aims to synthesize video with lip movements synchronized to input audio. However, current generative techniques face challenges in preserving intricate regional textures (skin, teeth). To address the aforementioned challenges, we propose a novel framework called SegTalker to decouple lip movements and image textures by introducing segmentation as intermediate representation. Specifically, given the mask of image employed by a parsing network, we first leverage the speech to drive the mask and generate talking segmentation. Then we disentangle semantic regions of image into style codes using a mask-guided encoder. Ultimately, we inject the previously generated talking segmentation and style codes into a mask-guided StyleGAN to synthesize video frame. In this way, most of textures are fully preserved. Moreover, our approach can inherently achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsHuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Feedforward Network · R1 Regularization · Adaptive Instance Normalization · Convolution · StyleGAN