Improved Lite Audio-Visual Speech Enhancement

Shang-Yi Chuang; Hsin-Min Wang; Yu Tsao

arXiv:2008.13222·eess.AS·February 2, 2022·6 cites

Improved Lite Audio-Visual Speech Enhancement

Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

PDF

Open Access 1 Repo

TL;DR

This paper introduces iLAVSE, an improved lightweight audio-visual speech enhancement system that effectively handles practical issues like visual data processing costs, asynchronization, and low-quality visuals, demonstrating its suitability for real-world noisy environments.

Contribution

The study extends the LAVSE framework to address real-world challenges, enhancing robustness and efficiency in audio-visual speech enhancement applications.

Findings

01

iLAVSE outperforms conventional AVSE in noisy conditions

02

It effectively manages visual data quality and synchronization issues

03

The system is suitable for scenarios with low-quality visual inputs

Abstract

Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce the noise of noisy speech signals. Recently, we proposed a lite audio-visual speech enhancement (LAVSE) algorithm for a car-driving scenario. Compared to conventional AVSE systems, LAVSE requires less online computation and to some extent solves the user privacy problem on facial data. In this study, we extend LAVSE to improve its ability to address three practical issues often encountered in implementing AVSE systems, namely, the additional cost of processing visual data, audio-visual asynchronization, and low-quality visual data. The proposed system is termed improved LAVSE (iLAVSE), which uses a convolutional recurrent neural network architecture as the core AVSE model. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kagaminccino/LAVSE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques