Visual Speech Enhancement Without A Real Visual Stream

Sindhu B Hegde; K R Prajwal; Rudrabha Mukhopadhyay; Vinay Namboodiri,; C.V. Jawahar

arXiv:2012.10852·cs.CV·December 22, 2020·1 cites

Visual Speech Enhancement Without A Real Visual Stream

Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri,, C.V. Jawahar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel speech enhancement method that uses a pseudo-lip model to generate visual cues for noise reduction, effective even without real video input, matching the performance of real lip-based methods.

Contribution

The paper presents a new paradigm for speech enhancement by synthesizing lip movements from audio, enabling visual noise filtering without real visual streams.

Findings

01

Pseudo-lip approach achieves speech intelligibility within 3% of real lip methods.

02

Model performs well across various real-world noise conditions.

03

Code and models are publicly available for future research.

Abstract

In this work, we re-think the task of speech enhancement in unconstrained real-world environments. Current state-of-the-art methods use only the audio stream and are limited in their performance in a wide range of real-world noises. Recent works using lip movements as additional cues improve the quality of generated speech over "audio-only" methods. But, these methods cannot be used for several applications where the visual stream is unreliable or completely absent. We propose a new paradigm for speech enhancement by exploiting recent breakthroughs in speech-driven lip synthesis. Using one such model as a teacher network, we train a robust student network to produce accurate lip movements that mask away the noise, thus acting as a "visual noise filter". The intelligibility of the speech enhanced by our pseudo-lip approach is comparable (< 3% difference) to the case of using real lips.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sindhu-Hegde/pseudo-visual-speech-denoising
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies