FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator
Massa Baali, Ahmed Ali

TL;DR
FOOCTTS is a pipeline that synthesizes Arabic football commentary with background crowd noise, using minimal data and adaptable to various domains and languages.
Contribution
It introduces a fast, domain-specific TTS system for Arabic football commentary with background noise, requiring only 15 minutes of recordings.
Findings
Capable of generating speech with background noise within 15 minutes.
System is generalizable to different domains and languages.
Uses Arabic automatic speech recognition for data labeling.
Abstract
This paper presents FOOCTTS, an automatic pipeline for a football commentator that generates speech with background crowd noise. The application gets the text from the user, applies text pre-processing such as vowelization, followed by the commentator's speech synthesizer. Our pipeline included Arabic automatic speech recognition for data labeling, CTC segmentation, transcription vowelization to match speech, and fine-tuning the TTS. Our system is capable of generating speech with its acoustic environment within limited 15 minutes of football commentator recording. Our prototype is generalizable and can be easily applied to different domains and languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · 1x1 Convolution · Feedforward Network · Two Time-scale Update Rule · Projection Discriminator · Non-Local Operation · Adam · Non-Local Block
