End-to-end Domain-Adversarial Voice Activity Detection

Marvin Lavechin; Marie-Philippe Gill; Ruben Bousbib; Herv\'e Bredin,; Leibny Paola Garcia-Perera

arXiv:1910.10655·eess.AS·May 27, 2020·Interspeech

End-to-end Domain-Adversarial Voice Activity Detection

Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Herv\'e Bredin,, Leibny Paola Garcia-Perera

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces an end-to-end neural network for voice activity detection that achieves state-of-the-art results and enhances robustness to domain mismatch using adversarial training, with an open-source implementation.

Contribution

It presents a novel end-to-end VAD model with trainable filters, incorporates adversarial domain adaptation, and provides a reproducible pipeline for diverse datasets.

Findings

01

State-of-the-art performance on DIHARD dataset

02

Adversarial training improves out-domain robustness by over 10%

03

Model outperforms cepstral coefficient-based variants

Abstract

Voice activity detection is the task of detecting speech regions in a given audio stream or recording. First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. Experiments on the challenging DIHARD dataset show that the proposed end-to-end model reaches state-of-the-art performance and outperforms a variant where trainable filters are replaced by standard cepstral coefficients. Our second contribution aims at making the proposed voice activity detection model robust to domain mismatch. To that end, a domain classification branch is added to the network and trained in an adversarial manner. The same DIHARD dataset, drawn from 11 different domains is used for evaluation under two scenarios. In the in-domain scenario where the training and test sets cover the exact same domains, we show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hbredin/DomainAdversarialVoiceActivityDetection
noneOfficial

Models

🤗
julien-c/voice-activity-detection
model· ♡ 17
♡ 17

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing