Self-Attention Linguistic-Acoustic Decoder
Santiago Pascual, Antonio Bonafonte, Joan Serr\`a

TL;DR
This paper introduces a transformer-based linguistic-acoustic decoder for text-to-speech conversion that maintains low distortion while significantly improving inference speed, making it suitable for resource-constrained devices.
Contribution
The work presents a non-recurrent transformer decoder for speech synthesis that achieves comparable distortion to RNNs but with much faster inference times.
Findings
Competitive distortion levels with recurrent models
Over an order of magnitude faster inference
Suitable for deployment on resource-limited devices
Abstract
The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure tends to make them slow to train and to sample from. In this work, we try to overcome the limitations of recursive structure by using a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder network is competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU inference time. On average, it increases Mel cepstral distortion between 0.1 and 0.3 dB, but it is over an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
