# Video-to-Video Translation for Visual Speech Synthesis

**Authors:** Michail C. Doukas, Viktoriia Sharmanska, Stefanos Zafeiriou

arXiv: 1905.12043 · 2019-05-30

## TL;DR

This paper introduces ViSpGAN, a novel character-based GAN architecture for translating videos of spoken words, enabling multi-domain visual speech synthesis with a vocabulary of 500 words.

## Contribution

It presents the first successful implementation of video-to-video translation for visual speech synthesis using a large vocabulary, surpassing limitations of existing image-based models.

## Key findings

- First to demonstrate video-to-video translation with 500 words
- Developed a character-based GAN architecture for visual speech
- Achieved multi-domain translation in video speech synthesis

## Abstract

Despite remarkable success in image-to-image translation that celebrates the advancements of generative adversarial networks (GANs), very limited attempts are known for video domain translation. We study the task of video-to-video translation in the context of visual speech generation, where the goal is to transform an input video of any spoken word to an output video of a different word. This is a multi-domain translation, where each word forms a domain of videos uttering this word. Adaptation of the state-of-the-art image-to-image translation model (StarGAN) to this setting falls short with a large vocabulary size. Instead we propose to use character encodings of the words and design a novel character-based GANs architecture for video-to-video translation called Visual Speech GAN (ViSpGAN). We are the first to demonstrate video-to-video translation with a vocabulary of 500 words.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12043/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12043/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1905.12043/full.md

---
Source: https://tomesphere.com/paper/1905.12043