One Shot Audio to Animated Video Generation
Neeraj Kumar, Srishti Goel, Ankur Narang, Brejesh Lall, Mujtaba Hasan,, Pranshu Agarwal, Dipankar Sarkar

TL;DR
This paper introduces OneShotAu2AV, a novel two-stage method for generating animated videos from audio and a single image, capable of producing synchronized lip movements, facial expressions, and head motions for arbitrary-length videos.
Contribution
The paper presents a new two-stage approach that converts audio and a single image into animated videos, including a novel unsupervised domain translation from human to animated domain.
Findings
Outperforms U-GAT-IT and RecycleGan on multiple metrics
Generates lip-synced, expressive animated videos of arbitrary length
Applicable to multiple languages without restrictions
Abstract
We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Video Analysis and Summarization
