Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn, Silvan Mertes, Dominik Schiller, Peter M. C. Harrison,, Pauline Larrouy-Maestri, Elisabeth Andr\'e, Nori Jacoby

TL;DR
This paper investigates how emotional prosody is represented in the high-dimensional latent space of a TTS model, revealing identifiable emotional prototypes that are recognized by humans and transferable across sentences.
Contribution
It introduces a novel method using Gibbs Sampling with People to explore and identify emotional prototypes in a TTS latent space, linking model regions to perceived emotions.
Findings
Certain latent space regions are reliably associated with specific emotions
Emotional prototypes are well-recognized by human raters
Prototypes can be transferred to new sentences effectively
Abstract
Recent TTS systems are able to generate prosodically varied and realistic speech. However, it is unclear how this prosodic variation contributes to the perception of speakers' emotional states. Here we use the recent psychological paradigm 'Gibbs Sampling with People' to search the prosodic latent space in a trained GST Tacotron model to explore prototypes of emotional prosody. Participants are recruited online and collectively manipulate the latent space of the generative speech model in a sequentially adaptive way so that the stimulus presented to one group of participants is determined by the response of the previous groups. We demonstrate that (1) particular regions of the model's latent space are reliably associated with particular emotions, (2) the resulting emotional prototypes are well-recognized by a separate group of human raters, and (3) these emotional prototypes can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sigmoid Activation · Highway Layer · Convolution · Bidirectional GRU · Highway Network · Dropout · Dense Connections · Batch Normalization · Tanh Activation
