MUSAN: A Music, Speech, and Noise Corpus

David Snyder; Guoguo Chen; Daniel Povey

arXiv:1510.08484·cs.SD·October 30, 2015·922 cites

MUSAN: A Music, Speech, and Noise Corpus

David Snyder, Guoguo Chen, Daniel Povey

PDF

Open Access 2 Repos 8 Models 2 Datasets

TL;DR

This paper presents MUSAN, a comprehensive and publicly available dataset of music, speech, and noise, designed to improve voice activity detection and music/speech discrimination models across various applications.

Contribution

It introduces a new diverse corpus with multiple genres and languages, specifically created for training and evaluating VAD and music/speech discrimination systems.

Findings

01

Effective for music/speech discrimination in Broadcast news

02

Useful for voice activity detection in speaker identification

03

Released under a flexible Creative Commons license

Abstract

This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing