Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer
Merlijn Blaauw, Jordi Bonada

TL;DR
This paper introduces a feed-forward Transformer-based sequence-to-sequence singing synthesizer that eliminates the need for pre-aligned training data, enabling faster inference and reducing exposure bias.
Contribution
It presents a novel feed-forward Transformer model for singing synthesis that leverages approximate initial alignments and refines them without autoregressive decoding.
Findings
Faster inference compared to autoregressive models
Effective refinement of initial alignments using self-attention
Importance of duration model accuracy for synthesis quality
Abstract
We propose a sequence-to-sequence singing synthesizer, which avoids the need for training data with pre-aligned phonetic and acoustic features. Rather than the more common approach of a content-based attention mechanism combined with an autoregressive decoder, we use a different mechanism suitable for feed-forward synthesis. Given that phonetic timings in singing are highly constrained by the musical score, we derive an approximate initial alignment with the help of a simple duration model. Then, using a decoder based on a feed-forward variant of the Transformer model, a series of self-attention and convolutional layers refines the result of the initial alignment to reach the target acoustic features. Advantages of this approach include faster inference and avoiding the exposure bias issues that affect autoregressive models trained by teacher forcing. We evaluate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
