Voice activity detection in the wild: A data-driven approach using   teacher-student training

Heinrich Dinkel; Shuai Wang; Xuenan Xu; Mengyue Wu; Kai Yu

arXiv:2105.04065·cs.SD·May 11, 2021

Voice activity detection in the wild: A data-driven approach using teacher-student training

Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu, Kai Yu

PDF

1 Repo

TL;DR

This paper introduces a data-driven teacher-student framework for voice activity detection that leverages weak labels and large-scale real-world data to improve performance in noisy environments.

Contribution

It presents a novel teacher-student training approach for VAD that requires only weak labels, enabling effective training on noisy, real-world datasets.

Findings

01

Significant improvements in noisy and real-world data scenarios.

02

Outperforms existing unsupervised and supervised VAD methods.

03

Effective utilization of large-scale, unconstrained audio datasets.

Abstract

Voice activity detection is an essential pre-processing component for speech-related tasks such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain frame-level labels from an ASR pipeline by using, e.g., a Hidden Markov model. These ASR models are commonly trained on clean and fully transcribed data, limiting VAD systems to be trained on clean or synthetically noised datasets. Therefore, a major challenge for supervised VAD systems is their generalization towards noisy, real-world data. This work proposes a data-driven teacher-student approach for VAD, which utilizes vast and unconstrained audio data for training. Unlike previous approaches, only weak labels during teacher training are required, enabling the utilization of any real-world, potentially noisy dataset. Our approach firstly trains a teacher model on a source dataset (Audioset) using clip-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richermans/datadriven-GPVAD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.