RECOApy: Data recording, pre-processing and phonetic transcription for   end-to-end speech-based applications

Adriana Stan

arXiv:2009.05493·eess.AS·September 16, 2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Adriana Stan

PDF

TL;DR

RECOApy is a comprehensive tool that simplifies data recording, pre-processing, and phonetic transcription for end-to-end speech applications across multiple languages, enhancing speech model training quality.

Contribution

It introduces an easy-to-use, multilingual tool with deep neural network-based G2P converters, optimized for various orthographies, and provides publicly available resources for speech processing.

Findings

01

G2P converters achieve low phoneme and word error rates.

02

The tool supports eight languages with diverse orthographies.

03

Phonetic lexicons and models are freely available.

Abstract

Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.