Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective
Xiao-Lei Zhang, Ji Wu

TL;DR
This paper explores transfer learning with denoising deep neural networks to improve voice activity detection across mismatched noisy datasets, addressing practical deployment challenges.
Contribution
It introduces three transfer learning techniques that learn shared feature representations to mitigate corpus mismatch in VAD.
Findings
Transfer learning improves VAD performance on mismatched datasets.
Denoising deep neural networks effectively learn shared features.
Experimental results confirm the effectiveness of proposed schemes.
Abstract
Mismatching problem between the source and target noisy corpora severely hinder the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem in the transfer learning prospective. Transfer learning tries to find a common learning machine or a common feature subspace that is shared by both the source corpus and the target corpus. The denoising deep neural network is used as the learning machine. Three transfer techniques, which aim to learn common feature representations, are used for analysis. Experimental results demonstrate the effectiveness of the transfer learning schemes on the mismatch problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
