Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

TL;DR
This paper introduces a novel method to reprogram pre-trained audio-driven talking face models to generate face videos from text inputs, eliminating the need for speech recordings during inference.
Contribution
It proposes a Text-to-Audio Embedding Module (TAEM) that maps text into the audio latent space, incorporating speaker characteristics from a single face image, enabling flexible text-driven face synthesis.
Findings
Effective text-to-face video generation demonstrated
Compatible with various pre-trained audio-driven models
High-quality face videos from text inputs achieved
Abstract
In this paper, we present a method for reprogramming pre-trained audio-driven talking face synthesis models to operate in a text-driven manner. Consequently, we can easily generate face videos that articulate the provided textual sentences, eliminating the necessity of recording speech for each inference, as required in the audio-driven model. To this end, we propose to embed the input text into the learned audio latent space of the pre-trained audio-driven model, while preserving the face synthesis capability of the original pre-trained model. Specifically, we devise a Text-to-Audio Embedding Module (TAEM) which maps a given text input into the audio latent space by modeling pronunciation and duration characteristics. Furthermore, to consider the speaker characteristics in audio while using text inputs, TAEM is designed to accept a visual speaker embedding. The visual speaker embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
