Towards Intelligibility-Oriented Audio-Visual Speech Enhancement

Tassadaq Hussain; Mandar Gogate; Kia Dashtipour; Amir Hussain

arXiv:2111.09642·cs.SD·November 19, 2021

Towards Intelligibility-Oriented Audio-Visual Speech Enhancement

Tassadaq Hussain, Mandar Gogate, Kia Dashtipour, Amir Hussain

PDF

1 Repo

TL;DR

This paper introduces a novel audio-visual speech enhancement model that uses an intelligibility-oriented loss function based on a modified STOI metric, improving robustness and generalization in noisy environments.

Contribution

It is the first to combine audio-visual information with an intelligibility-oriented loss function for speech enhancement, demonstrating superior performance over traditional methods.

Findings

01

Outperforms audio-only and conventional AV models on unseen speakers and noises.

02

Uses a fully convolutional AV model with a modified STOI loss function.

03

Shows improved speech intelligibility in noisy conditions.

Abstract

Existing deep learning (DL) based speech enhancement approaches are generally optimised to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in real noisy situations. In an attempt to address these challenges, researchers have explored intelligibility-oriented (I-O) loss functions and integration of audio-visual (AV) information for more robust speech enhancement (SE). In this paper, we introduce DL based I-O SE algorithms exploiting AV information, which is a novel and previously unexplored research direction. Specifically, we present a fully convolutional AV SE model that uses a modified short-time objective intelligibility (STOI) metric as a training cost function. To the best of our knowledge, this is the first work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cogmhear/Intelligibility-Oriented-Audio-Visual-Speech-Enhancement
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.