The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi, Lloyd May, Jonathan Berger

TL;DR
This paper explores how vocal persona influences synthesized speech, proposing a framework for context-dependent voice expression and analyzing expert insights to enhance human-computer interaction applications.
Contribution
It introduces a novel framework for embedding contextually-dependent vocal personas in speech synthesis systems, informed by expert interviews.
Findings
Identified key themes from expert interviews on vocal persona
Framework enables dynamic, context-aware voice modulation
Initial results suggest improved naturalness and expressiveness
Abstract
The inclusion of voice persona in synthesized voice can be significant in a broad range of human-computer-interaction (HCI) applications, including augmentative and assistive communication (AAC), artistic performance, and design of virtual agents. We propose a framework to imbue compelling and contextually-dependent expression within a synthesized voice by introducing the role of the vocal persona within a synthesis system. In this framework, the resultant 'tone of voice' is defined as a point existing within a continuous, contextually-dependent probability space that is traversable by the user of the voice. We also present initial findings of a thematic analysis of 10 interviews with vocal studies and performance experts to further understand the role of the vocal persona within a natural communication ecology. The themes identified are then used to inform the design of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Speech and dialogue systems · AI in Service Interactions
