Voices Obscured in Complex Environmental Settings (VOICES) corpus
Colleen Richey, Maria A.Barrios, Zeb Armstrong, Chris Bartels,, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen, Stauffer, Julien van Hout, Paul Gamble, Jeff Hetherly, Cory Stephenson, and, Karl Ni

TL;DR
The VOICES corpus provides a large, real-world dataset of speech recorded in complex environmental settings to advance distant microphone speech processing and recognition research.
Contribution
This paper introduces the VOICES corpus, a comprehensive, real-world dataset of far-field speech in noisy environments, enabling more realistic training and evaluation of speech models.
Findings
120 hours of multi-microphone recordings in furnished rooms
Diverse noise and room conditions captured in the dataset
Supports development of robust distant microphone speech processing
Abstract
This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
