# Text-based Editing of Talking-head Video

**Authors:** Ohad Fried, Ayush Tewari, Michael Zollh\"ofer, Adam Finkelstein, Eli, Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh, Agrawala

arXiv: 1906.01524 · 2019-06-05

## TL;DR

This paper introduces a novel transcript-based editing method for talking-head videos that seamlessly modifies speech content while maintaining realistic audio-visual continuity, enabling diverse edits like word changes and translation.

## Contribution

It presents an automated, transcript-driven editing pipeline that combines annotation, segment stitching, and neural rendering to produce realistic, edited talking-head videos without jump cuts.

## Key findings

- Enables realistic editing of speech content in videos.
- Supports language translation and sentence synthesis.
- Maintains seamless audio-visual flow during edits.

## Abstract

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01524/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01524/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/1906.01524/full.md

---
Source: https://tomesphere.com/paper/1906.01524