Lip-Reading Driven Deep Learning Approach for Speech Enhancement

Ahsan Adeel; Mandar Gogate; Amir Hussain; William M. Whitmer

arXiv:1808.00046·cs.CV·August 2, 2018

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

Ahsan Adeel, Mandar Gogate, Amir Hussain, William M. Whitmer

PDF

TL;DR

This paper introduces a novel audio-visual speech enhancement framework that combines deep learning-based lip-reading with an enhanced Wiener filter, significantly improving speech quality and intelligibility in noisy real-world scenarios.

Contribution

It presents a new deep learning lip-reading model and an enhanced visually-derived Wiener filter, integrating visual and acoustic modeling for superior speech enhancement.

Findings

01

Significant improvement in speech quality and intelligibility over benchmark methods.

02

Effective performance across various real-world noisy environments.

03

Demonstrated robustness at different SNR levels.

Abstract

This paper proposes a novel lip-reading driven deep learning framework for speech enhancement. The proposed approach leverages the complementary strengths of both deep learning and analytical acoustic modelling (filtering based approach) as compared to recently published, comparatively simpler benchmark approaches that rely only on deep learning. The proposed audio-visual (AV) speech enhancement framework operates at two levels. In the first level, a novel deep learning-based lip-reading regression model is employed. In the second level, lip-reading approximated clean-audio features are exploited, using an enhanced, visually-derived Wiener filter (EVWF), for the clean audio power spectrum estimation. Specifically, a stacked long-short-term memory (LSTM) based lip-reading regression model is designed for clean audio features estimation using only temporal visual features considering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.