Realistic multi-microphone data simulation for distant speech   recognition

Mirco Ravanelli; Piergiorgio Svaizer; Maurizio Omologo

arXiv:1711.09470·eess.AS·November 28, 2017

Realistic multi-microphone data simulation for distant speech recognition

Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo

PDF

1 Repo

TL;DR

This paper presents a method for generating realistic multi-microphone simulated data for distant speech recognition, demonstrating comparable performance to real data across various models and techniques.

Contribution

The authors introduce a new approach to simulate realistic multi-microphone distant speech data, improving coherence with real-world conditions and aiding research without extensive real data collection.

Findings

01

Simulated data yields similar recognition performance trends as real data.

02

The approach is effective across different acoustic models and processing techniques.

03

Realistic simulation reduces the need for laborious real environment recordings.

Abstract

The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment. In the last decade, several simulated corpora have been released to the research community, including the data-sets distributed in the context of projects and international challenges, such as CHiME and REVERB. These efforts were extremely useful to derive baselines and common evaluation frameworks for comparison purposes. At the same time, in many cases they highlighted the need of a better coherence between real and simulated conditions. In this paper, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mravanelli/pySpeechRev
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.