The FruitShell French synthesis system at the Blizzard 2023 Challenge
Xin Qi, Xiaopeng Wang, Zhiyong Wang, Wang Liu, Mingming Ding, Shuchen, Shi

TL;DR
This paper describes a French text-to-speech system developed for the Blizzard 2023 Challenge, utilizing data preprocessing, phoneme conversion, and a VITS-based model with speaker adaptation, achieving average quality scores.
Contribution
The paper introduces a comprehensive pipeline for French TTS including data cleaning, phoneme standardization, and multi-speaker modeling with a VITS-based architecture.
Findings
Achieved MOS scores of 3.6 and 3.4 on the Hub and Spoke tasks.
System ranked at an average level among participating teams.
Implemented effective data augmentation and phoneme conversion techniques.
Abstract
This paper presents a French text-to-speech synthesis system for the Blizzard Challenge 2023. The challenge consists of two tasks: generating high-quality speech from female speakers and generating speech that closely resembles specific individuals. Regarding the competition data, we conducted a screening process to remove missing or erroneous text data. We organized all symbols except for phonemes and eliminated symbols that had no pronunciation or zero duration. Additionally, we added word boundary and start/end symbols to the text, which we have found to improve speech quality based on our previous experience. For the Spoke task, we performed data augmentation according to the competition rules. We used an open-source G2P model to transcribe the French texts into phonemes. As the G2P model uses the International Phonetic Alphabet (IPA), we applied the same transcription process to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
