RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications
Adriana Stan

TL;DR
RECOApy is a comprehensive tool that simplifies data recording, pre-processing, and phonetic transcription for end-to-end speech applications across multiple languages, enhancing speech model training quality.
Contribution
It introduces an easy-to-use, multilingual tool with deep neural network-based G2P converters, optimized for various orthographies, and provides publicly available resources for speech processing.
Findings
G2P converters achieve low phoneme and word error rates.
The tool supports eight languages with diverse orthographies.
Phonetic lexicons and models are freely available.
Abstract
Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
