Improved Lite Audio-Visual Speech Enhancement
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

TL;DR
This paper introduces iLAVSE, an improved lightweight audio-visual speech enhancement system that effectively handles practical issues like visual data processing costs, asynchronization, and low-quality visuals, demonstrating its suitability for real-world noisy environments.
Contribution
The study extends the LAVSE framework to address real-world challenges, enhancing robustness and efficiency in audio-visual speech enhancement applications.
Findings
iLAVSE outperforms conventional AVSE in noisy conditions
It effectively manages visual data quality and synchronization issues
The system is suitable for scenarios with low-quality visual inputs
Abstract
Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce the noise of noisy speech signals. Recently, we proposed a lite audio-visual speech enhancement (LAVSE) algorithm for a car-driving scenario. Compared to conventional AVSE systems, LAVSE requires less online computation and to some extent solves the user privacy problem on facial data. In this study, we extend LAVSE to improve its ability to address three practical issues often encountered in implementing AVSE systems, namely, the additional cost of processing visual data, audio-visual asynchronization, and low-quality visual data. The proposed system is termed improved LAVSE (iLAVSE), which uses a convolutional recurrent neural network architecture as the core AVSE model. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques
