NAS-VAD: Neural Architecture Search for Voice Activity Detection

Daniel Rho; Jinhyeok Park; and Jong Hwan Ko

arXiv:2201.09032·cs.SD·October 5, 2022

NAS-VAD: Neural Architecture Search for Voice Activity Detection

Daniel Rho, Jinhyeok Park, and Jong Hwan Ko

PDF

Open Access 1 Repo

TL;DR

This paper introduces NAS-VAD, a neural architecture search framework tailored for voice activity detection, which automatically designs superior neural network architectures that outperform manual models across various noisy and real-world datasets.

Contribution

First application of neural architecture search to voice activity detection, with a novel search space and macro structure that enhances performance and generalization.

Findings

01

Outperforms previous state-of-the-art VAD models in noisy conditions

02

Achieves better generalization on unseen datasets

03

Introduces a new NAS framework with broader operation search space

Abstract

Various neural network-based approaches have been proposed for more robust and accurate voice activity detection (VAD). Manual design of such neural architectures is an error-prone and time-consuming process, which prompted the development of neural architecture search (NAS) that automatically design and optimize network architectures. While NAS has been successfully applied to improve performance in a variety of tasks, it has not yet been exploited in the VAD domain. In this paper, we present the first work that utilizes NAS approaches on the VAD task. To effectively search architectures for the VAD task, we propose a modified macro structure and a new search space with a much broader range of operations that includes attention operations. The results show that the network structures found by the propose NAS framework outperform previous manually designed state-of-the-art VAD models in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daniel03c1/nas_vad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis