VoiceX: A Text-To-Speech Framework for Custom Voices

Silvan Mertes; Daksitha Withanage Don; Otto Grothe; Johanna Kuch,; Ruben Schlagowski; Elisabeth Andr\'e

arXiv:2408.12170·cs.HC·August 23, 2024

VoiceX: A Text-To-Speech Framework for Custom Voices

Silvan Mertes, Daksitha Withanage Don, Otto Grothe, Johanna Kuch,, Ruben Schlagowski, Elisabeth Andr\'e

PDF

Open Access

TL;DR

VoiceX introduces an interactive evolutionary algorithm-based framework with a user-friendly GUI for customizing neural TTS voices, making voice creation accessible without deep technical expertise.

Contribution

The paper presents a novel human-in-the-loop approach using evolutionary algorithms for neural TTS customization, integrated into an accessible GUI and API.

Findings

01

Effective voice customization demonstrated in user study

02

User-friendly interface facilitates non-expert voice creation

03

Custom voices are compatible with existing TTS models

Abstract

Modern TTS systems are capable of creating highly realistic and natural-sounding speech. Despite these developments, the process of customizing TTS voices remains a complex task, mostly requiring the expertise of specialists within the field. One reason for this is the utilization of deep learning models, which are characterized by their expansive, non-interpretable parameter spaces, restricting the feasibility of manual customization. In this paper, we present a novel human-in-the-loop paradigm based on an evolutionary algorithm for directly interacting with the parameter space of a neural TTS model. We integrated our approach into a user-friendly graphical user interface that allows users to efficiently create original voices. Those voices can then be used with the backbone TTS model, for which we provide a Python API. Further, we present the results of a user study exploring the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques