Context-Aware Talking-Head Video Editing

Songlin Yang; Wei Wang; Jun Ling; Bo Peng; Xu Tan; Jing Dong

arXiv:2308.00462·cs.MM·September 21, 2023

Context-Aware Talking-Head Video Editing

Songlin Yang, Wei Wang, Jun Ling, Bo Peng, Xu Tan, Jing Dong

PDF

Open Access

TL;DR

This paper introduces a novel, efficient framework for talking-head video editing that ensures accurate lip synchronization, smooth motion, and disentangled control of verbal and non-verbal cues, outperforming prior methods.

Contribution

The work presents a new framework combining motion prediction and neural rendering for efficient, high-quality talking-head video editing with disentangled control.

Findings

01

Achieves smoother, more realistic video edits with higher lip-sync accuracy.

02

Requires less data and training time compared to previous methods.

03

Provides better generalization to unseen speech and identities.

Abstract

Talking-head video editing aims to efficiently insert, delete, and substitute the word of a pre-recorded video through a text transcript editor. The key challenge for this task is obtaining an editing model that generates new talking-head video clips which simultaneously have accurate lip synchronization and motion smoothness. Previous approaches, including 3DMM-based (3D Morphable Model) methods and NeRF-based (Neural Radiance Field) methods, are sub-optimal in that they either require minutes of source videos and days of training time or lack the disentangled control of verbal (e.g., lip motion) and non-verbal (e.g., head pose and expression) representations for video clip insertion. In this work, we fully utilize the video context to design a novel framework for talking-head video editing, which achieves efficiency, disentangled motion control, and sequential smoothness.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Video Analysis and Summarization