# Exploring Transfer Learning for Low Resource Emotional TTS

**Authors:** No\'e Tits, Kevin El Haddad, Thierry Dutoit

arXiv: 1901.04276 · 2019-01-15

## TL;DR

This paper explores transfer learning techniques to enable low-resource emotional speech synthesis by fine-tuning pre-trained TTS models with small datasets, addressing data scarcity in modeling speaker variability and emotions.

## Contribution

It demonstrates the effectiveness of fine-tuning pre-trained TTS models for low-resource emotional speech synthesis, a novel approach in the field.

## Key findings

- Fine-tuning improves emotional TTS quality with limited data
- Transfer learning enables speaker variability modeling with small datasets
- Emotion adaptation is feasible with minimal emotional data

## Abstract

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.04276/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1901.04276/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1901.04276/full.md

---
Source: https://tomesphere.com/paper/1901.04276