Voices Obscured in Complex Environmental Settings (VOICES) corpus

Colleen Richey; Maria A.Barrios; Zeb Armstrong; Chris Bartels,; Horacio Franco; Martin Graciarena; Aaron Lawson; Mahesh Kumar Nandwana; Allen; Stauffer; Julien van Hout; Paul Gamble; Jeff Hetherly; Cory Stephenson; and; Karl Ni

arXiv:1804.05053·cs.SD·May 17, 2018·43 cites

Voices Obscured in Complex Environmental Settings (VOICES) corpus

Colleen Richey, Maria A.Barrios, Zeb Armstrong, Chris Bartels,, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen, Stauffer, Julien van Hout, Paul Gamble, Jeff Hetherly, Cory Stephenson, and, Karl Ni

PDF

Open Access

TL;DR

The VOICES corpus provides a large, real-world dataset of speech recorded in complex environmental settings to advance distant microphone speech processing and recognition research.

Contribution

This paper introduces the VOICES corpus, a comprehensive, real-world dataset of far-field speech in noisy environments, enabling more realistic training and evaluation of speech models.

Findings

01

120 hours of multi-microphone recordings in furnished rooms

02

Diverse noise and room conditions captured in the dataset

03

Supports development of robust distant microphone speech processing

Abstract

This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing