# Visualization and Interpretation of Latent Spaces for Controlling   Expressive Speech Synthesis through Audio Analysis

**Authors:** No\'e Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry, Dutoit

arXiv: 1903.11570 · 2019-03-28

## TL;DR

This paper analyzes and interprets latent spaces in expressive speech synthesis, enabling more controllable and understandable systems by revealing how different latent variables influence speech expressiveness.

## Contribution

It provides a comparative analysis of latent spaces in speech synthesis, offering insights into their interpretability and influence on speech expressiveness.

## Key findings

- Different latent spaces have distinct interpretability
- Latent variables significantly influence speech style
- Analysis enables more controllable speech synthesis systems

## Abstract

The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impressive results. However the control parameters often consist of latent variables and remain complex to interpret. In this paper, we analyze and compare different latent spaces and obtain an interpretation of their influence on expressive speech. This will enable the possibility to build controllable speech synthesis systems with an understandable behaviour.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.11570/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1903.11570/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1903.11570/full.md

---
Source: https://tomesphere.com/paper/1903.11570