TL;DR
This paper presents the DIRHA-English corpus, a comprehensive multi-microphone dataset from domestic environments, and demonstrates its utility for distant-speech recognition research with baseline results using advanced neural network techniques.
Contribution
Introduction of the DIRHA-English corpus with real and simulated data for domestic distant-speech recognition tasks, along with baseline results using state-of-the-art neural network methods.
Findings
Baseline results achieved with Deep Neural Networks.
The corpus supports various speech recognition tasks.
The dataset includes diverse acoustic conditions and speaker data.
Abstract
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
