Singing Synthesis: with a little help from my attention

Orazio Angelini; Alexis Moinet; Kayoko Yanagisawa; Thomas Drugman

arXiv:1912.05881·eess.AS·May 7, 2020

Singing Synthesis: with a little help from my attention

Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman

PDF

TL;DR

UTACO introduces an attention-based singing synthesis model that improves naturalness without explicit duration or pitch input, demonstrating the effective application of sequence-to-sequence models in singing synthesis.

Contribution

This work applies attention-based sequence-to-sequence models to singing synthesis, reducing the need for explicit voice feature modeling and achieving higher naturalness.

Findings

01

Improves naturalness over previous models

02

Learns vibrato autonomously from musical context

03

Reduces explicit duration and pitch modeling requirements

Abstract

We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions. These two classes of models have significantly affected the field of text-to-speech, but have never been thoroughly applied to the task of singing synthesis. UTACO demonstrates that attention can be successfully applied to the singing synthesis field and improves naturalness over the state of the art. The system requires considerably less explicit modelling of voice features such as F0 patterns, vibratos, and note and phoneme durations, than previous models in the literature. Despite this, it shows a strong improvement in naturalness with respect to previous neural singing synthesis models. The model does not require any durations or pitch patterns as inputs, and learns to insert vibrato autonomously according to the musical context.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.