Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Shailza Sharma, Abhinav Dhall, Vinay Kumar, Vivek Singh Bawa

TL;DR
This paper introduces a novel audio-visual GAN for video face hallucination that leverages speech signals and frequency supervision to improve facial detail and motion consistency in videos.
Contribution
It proposes a cross-modal GAN architecture with a speech-based lip reading loss and frequency-based loss to enhance facial detail and motion accuracy in video face hallucination.
Findings
Significant improvement over state-of-the-art methods in visual quality.
Enhanced motion consistency and facial detail in hallucinated videos.
Effective handling of blurriness around key facial regions.
Abstract
Recently, there has been numerous breakthroughs in face hallucination tasks. However, the task remains rather challenging in videos in comparison to the images due to inherent consistency issues. The presence of extra temporal dimension in video face hallucination makes it non-trivial to learn the facial motion through out the sequence. In order to learn these fine spatio-temporal motion details, we propose a novel cross-modal audio-visual Video Face Hallucination Generative Adversarial Network (VFH-GAN). The architecture exploits the semantic correlation of between the movement of the facial structure and the associated speech signal. Another major issue in present video based approaches is the presence of blurriness around the key facial regions such as mouth and lips - where spatial displacement is much higher in comparison to other areas. The proposed approach explicitly defines a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Facial Nerve Paralysis Treatment and Research · Image and Signal Denoising Methods
