AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou, Athanasios Katsamanis

TL;DR
This literature review explores audiovisual speech synthesis by analyzing the separate components of text-to-speech conversion and talking head animation, highlighting various models and their advantages and disadvantages.
Contribution
It provides a comprehensive categorization and discussion of existing methods in audiovisual speech synthesis, emphasizing the importance of facial models and intermediate representations.
Findings
Different TTS models map text to acoustic features
Voice-driven animation approaches vary by facial model type
Review highlights strengths and weaknesses of various methods
Abstract
This brief literature review studies the problem of audiovisual speech synthesis, which is the problem of generating an animated talking head given a text as input. Due to the high complexity of this problem, we approach it as the composition of two problems. Specifically, that of Text-to-Speech (TTS) synthesis as well as the voice-driven talking head animation. For TTS, we present models that are used to map text to intermediate acoustic representations, e.g. mel-spectrograms, as well as models that generate voice signals conditioned on these intermediate representations, i.e vocoders. For the talking-head animation problem, we categorize approaches based on whether they produce human faces or anthropomorphic figures. An attempt is also made to discuss the importance of the choice of facial models in the second case. Throughout the review, we briefly describe the most important work in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
