FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football   Commentator

Massa Baali; Ahmed Ali

arXiv:2306.07936·eess.AS·June 14, 2023·1 cites

FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator

Massa Baali, Ahmed Ali

PDF

Open Access

TL;DR

FOOCTTS is a pipeline that synthesizes Arabic football commentary with background crowd noise, using minimal data and adaptable to various domains and languages.

Contribution

It introduces a fast, domain-specific TTS system for Arabic football commentary with background noise, requiring only 15 minutes of recordings.

Findings

01

Capable of generating speech with background noise within 15 minutes.

02

System is generalizable to different domains and languages.

03

Uses Arabic automatic speech recognition for data labeling.

Abstract

This paper presents FOOCTTS, an automatic pipeline for a football commentator that generates speech with background crowd noise. The application gets the text from the user, applies text pre-processing such as vowelization, followed by the commentator's speech synthesizer. Our pipeline included Arabic automatic speech recognition for data labeling, CTC segmentation, transcription vowelization to match speech, and fine-tuning the TTS. Our system is capable of generating speech with its acoustic environment within limited 15 minutes of football commentator recording. Our prototype is generalizable and can be easily applied to different domains and languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · 1x1 Convolution · Feedforward Network · Two Time-scale Update Rule · Projection Discriminator · Non-Local Operation · Adam · Non-Local Block