VoiceX: A Text-To-Speech Framework for Custom Voices
Silvan Mertes, Daksitha Withanage Don, Otto Grothe, Johanna Kuch,, Ruben Schlagowski, Elisabeth Andr\'e

TL;DR
VoiceX introduces an interactive evolutionary algorithm-based framework with a user-friendly GUI for customizing neural TTS voices, making voice creation accessible without deep technical expertise.
Contribution
The paper presents a novel human-in-the-loop approach using evolutionary algorithms for neural TTS customization, integrated into an accessible GUI and API.
Findings
Effective voice customization demonstrated in user study
User-friendly interface facilitates non-expert voice creation
Custom voices are compatible with existing TTS models
Abstract
Modern TTS systems are capable of creating highly realistic and natural-sounding speech. Despite these developments, the process of customizing TTS voices remains a complex task, mostly requiring the expertise of specialists within the field. One reason for this is the utilization of deep learning models, which are characterized by their expansive, non-interpretable parameter spaces, restricting the feasibility of manual customization. In this paper, we present a novel human-in-the-loop paradigm based on an evolutionary algorithm for directly interacting with the parameter space of a neural TTS model. We integrated our approach into a user-friendly graphical user interface that allows users to efficiently create original voices. Those voices can then be used with the backbone TTS model, for which we provide a Python API. Further, we present the results of a user study exploring the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
