Expressivity and Speech Synthesis

Andreas Triantafyllopoulos; Bj\"orn W. Schuller

arXiv:2404.19363·cs.CL·April 11, 2025·1 cites

Expressivity and Speech Synthesis

Andreas Triantafyllopoulos, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper reviews recent advances in speech synthesis, emphasizing the progress towards expressive, high-fidelity speech generation and discussing societal implications and ethical considerations.

Contribution

It summarizes methodological developments in expressive speech synthesis and explores future directions for creating more complex, emotionally rich speech outputs.

Findings

01

Achieved high-fidelity, expressive speech synthesis for isolated utterances

02

Identified key methodological advances enabling expressivity in speech synthesis

03

Discussed societal risks and ethical considerations of ESS technology

Abstract

Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many years of research, it appears that we are on the cusp of achieving this when it comes to single, isolated utterances. This unveils an abundance of potential avenues to explore when it comes to combining these single utterances with the aim of synthesising more complex, longer-term behaviours. In the present chapter, we outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity. We also discuss the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Communication and Language