CochleaNet: A Robust Language-independent Audio-Visual Model for Speech   Enhancement

Mandar Gogate; Kia Dashtipour; Ahsan Adeel; Amir Hussain

arXiv:1909.10407·cs.SD·September 24, 2019

CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement

Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain

PDF

Open Access

TL;DR

CochleaNet is a novel, robust, language-independent audio-visual deep neural network designed for speech enhancement in noisy environments, leveraging visual cues and trained on limited data to outperform existing methods.

Contribution

The paper introduces CochleaNet, a causal AV speech enhancement model that generalizes across languages, noises, and speakers, challenging the belief that large multi-language AV datasets are necessary.

Findings

01

CochleaNet outperforms state-of-the-art SE approaches in objective and subjective tests.

02

The model trained on limited synthetic data generalizes well to real noisy environments.

03

The approach demonstrates robustness across languages, speaker variations, and noise types.

Abstract

Noisy situations cause huge problems for suffers of hearing loss as hearing aids often make the signal more audible but do not always restore the intelligibility. In noisy settings, humans routinely exploit the audio-visual (AV) nature of the speech to selectively suppress the background noise and to focus on the target speaker. In this paper, we present a causal, language, noise and speaker independent AV deep neural network (DNN) architecture for speech enhancement (SE). The model exploits the noisy acoustic cues and noise robust visual cues to focus on the desired speaker and improve the speech intelligibility. To evaluate the proposed SE framework a first of its kind AV binaural speech corpus, called ASPIRE, is recorded in real noisy environments including cafeteria and restaurant. We demonstrate superior performance of our approach in terms of objective measures and subjective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing