Improvement of Noise-Robust Single-Channel Voice Activity Detection with Spatial Pre-processing
Max V{\ae}hrens, Andreas Jonas Fuglsig, Anders Post Jacobsen, Nicolai, Almskou Rasmussen, Victor M{\o}lbach Nissen, Joachim Roland Hejslet and, Zheng-Hua Tan

TL;DR
This paper enhances single-channel voice activity detection (VAD) in noisy environments by applying spatial pre-processing techniques, such as beamforming and spatial detection, leading to significant improvements over traditional methods and even multi-channel VAD in challenging conditions.
Contribution
The study introduces novel spatial pre-processing methods to improve single-channel VAD, demonstrating superior noise robustness compared to existing approaches.
Findings
Spatial detector significantly improves VAD accuracy.
Pre-processing methods outperform baseline MVAD in noisy conditions.
SVAD with spatial pre-processing is effective across various noise types.
Abstract
Voice activity detection (VAD) remains a challenge in noisy environments. With access to multiple microphones, prior studies have attempted to improve the noise robustness of VAD by creating multi-channel VAD (MVAD) methods. However, MVAD is relatively new compared to single-channel VAD (SVAD), which has been thoroughly developed in the past. It might therefore be advantageous to improve SVAD methods with pre-processing to obtain superior VAD, which is under-explored. This paper improves SVAD through two pre-processing methods, a beamformer and a spatial target speaker detector. The spatial detector sets signal frames to zero when no potential speaker is present within a target direction. The detector may be implemented as a filter, meaning the input signal for the SVAD is filtered according to the detector's output; or it may be implemented as a spatial VAD to be combined with the SVAD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
