MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network   for Voice Activity Detection

Fei Jia; Somshubra Majumdar; Boris Ginsburg

arXiv:2010.13886·eess.AS·February 15, 2021·5 cites

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

Fei Jia, Somshubra Majumdar, Boris Ginsburg

PDF

Open Access 1 Models

TL;DR

MarbleNet is a deep neural network utilizing 1D time-channel separable convolutions, achieving comparable voice activity detection performance to state-of-the-art models with significantly fewer parameters, and demonstrating robustness in real-world scenarios.

Contribution

The paper introduces MarbleNet, a novel deep residual network with 1D separable convolutions, reducing parameter count while maintaining high VAD accuracy.

Findings

01

Achieves similar performance to state-of-the-art VAD models

02

Uses approximately 1/10th the parameters of comparable models

03

Demonstrates robustness through extensive ablation studies

Abstract

We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive ablation studies on different training methods and choices of parameters in order to study the robustness of MarbleNet in real-world VAD tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0
model· 6.4k dl· ♡ 38
6.4k dl♡ 38

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing