Robust One Shot Audio to Video Generation

Neeraj Kumar; Srishti Goel; Ankur Narang; Mujtaba Hasan

arXiv:2012.07842·cs.CV·December 16, 2020

Robust One Shot Audio to Video Generation

Neeraj Kumar, Srishti Goel, Ankur Narang, Mujtaba Hasan

PDF

Open Access

TL;DR

This paper introduces OneShotA2V, a novel method for generating high-quality talking-head videos from a single image and audio, using curriculum and few-shot learning to adapt to unseen persons with multilingual capabilities.

Contribution

The paper presents a new approach combining curriculum learning and few-shot adaptation for one-shot audio-to-video generation of talking heads, outperforming existing methods.

Findings

01

Superior quantitative metrics (SSIM, PSNR, CPBD) compared to prior methods.

02

Qualitative and Turing tests confirm high realism and effectiveness.

03

Multilingual applicability demonstrated across diverse audio inputs.

Abstract

Audio to Video generation is an interesting problem that has numerous applications across industry verticals including film making, multi-media, marketing, education and others. High-quality video generation with expressive facial movements is a challenging problem that involves complex learning steps for generative adversarial networks. Further, enabling one-shot learning for an unseen single image increases the complexity of the problem while simultaneously making it more applicable to practical scenarios. In the paper, we propose a novel approach OneShotA2V to synthesize a talking person video of arbitrary length using as input: an audio signal and a single unseen image of a person. OneShotA2V leverages curriculum learning to learn movements of expressive facial components and hence generates a high-quality talking-head video of the given person. Further, it feeds the features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Digital Media Forensic Detection