Transfer Learning for Voice Activity Detection: A Denoising Deep Neural   Network Perspective

Xiao-Lei Zhang; Ji Wu

arXiv:1303.2104·cs.LG·March 11, 2013·1 cites

Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective

Xiao-Lei Zhang, Ji Wu

PDF

Open Access

TL;DR

This paper explores transfer learning with denoising deep neural networks to improve voice activity detection across mismatched noisy datasets, addressing practical deployment challenges.

Contribution

It introduces three transfer learning techniques that learn shared feature representations to mitigate corpus mismatch in VAD.

Findings

01

Transfer learning improves VAD performance on mismatched datasets.

02

Denoising deep neural networks effectively learn shared features.

03

Experimental results confirm the effectiveness of proposed schemes.

Abstract

Mismatching problem between the source and target noisy corpora severely hinder the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem in the transfer learning prospective. Transfer learning tries to find a common learning machine or a common feature subspace that is shared by both the source corpus and the target corpus. The denoising deep neural network is used as the learning machine. Three transfer techniques, which aim to learn common feature representations, are used for analysis. Experimental results demonstrate the effectiveness of the transfer learning schemes on the mismatch problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing