VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention
Yuewei Zhang, Huanbin Zou, Jie Zhu

TL;DR
VSANet introduces a multi-task learning framework combining speech enhancement and voice activity detection with a causal spatial attention block, achieving improved real-time speech enhancement performance.
Contribution
The paper proposes a novel multi-task learning framework with a causal spatial attention block for real-time speech enhancement, enhancing DNN representation and performance.
Findings
Multi-task learning improves speech enhancement quality.
Causal spatial attention enhances DNN feature representation.
VSANet outperforms existing methods in speech enhancement tasks.
Abstract
The deep learning-based speech enhancement (SE) methods always take the clean speech's waveform or time-frequency spectrum feature as the learning target, and train the deep neural network (DNN) by reducing the error loss between the DNN's output and the target. This is a conventional single-task learning paradigm, which has been proven to be effective, but we find that the multi-task learning framework can improve SE performance. Specifically, we design a framework containing a SE module and a voice activity detection (VAD) module, both of which share the same encoder, and the whole network is optimized by the weighted loss of the two modules. Moreover, we design a causal spatial attention (CSA) block to promote the representation capability of DNN. Combining the VAD aided multi-task learning framework and CSA block, our SE network is named VSANet. The experimental results prove the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques
