SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky; Ofir Lindenbaum

arXiv:2210.16022·eess.AS·October 31, 2022

SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky, Ofir Lindenbaum

PDF

Open Access 1 Repo

TL;DR

This paper introduces SG-VAD, a low-resource speech activity detection model that uses stochastic gates to identify nuisance features, outperforming previous methods with a compact architecture.

Contribution

The paper presents a novel VAD model that models speech detection as a denoising task, with a lightweight design and improved performance.

Findings

01

Outperforms previous VAD methods on AVA-Speech dataset

02

Contains only 7.8K parameters, suitable for low-resource environments

03

Provides comprehensive architecture, experimental results, and ablation studies.

Abstract

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jsvir/vad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing