End-to-end Domain-Adversarial Voice Activity Detection
Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Herv\'e Bredin,, Leibny Paola Garcia-Perera

TL;DR
This paper introduces an end-to-end neural network for voice activity detection that achieves state-of-the-art results and enhances robustness to domain mismatch using adversarial training, with an open-source implementation.
Contribution
It presents a novel end-to-end VAD model with trainable filters, incorporates adversarial domain adaptation, and provides a reproducible pipeline for diverse datasets.
Findings
State-of-the-art performance on DIHARD dataset
Adversarial training improves out-domain robustness by over 10%
Model outperforms cepstral coefficient-based variants
Abstract
Voice activity detection is the task of detecting speech regions in a given audio stream or recording. First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. Experiments on the challenging DIHARD dataset show that the proposed end-to-end model reaches state-of-the-art performance and outperforms a variant where trainable filters are replaced by standard cepstral coefficients. Our second contribution aims at making the proposed voice activity detection model robust to domain mismatch. To that end, a domain classification branch is added to the network and trained in an adversarial manner. The same DIHARD dataset, drawn from 11 different domains is used for evaluation under two scenarios. In the in-domain scenario where the training and test sets cover the exact same domains, we show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
