Universal Adaptor: Converting Mel-Spectrograms Between Different   Configurations for Speech Synthesis

Fan-Lin Wang; Po-chun Hsu; Da-rong Liu; Hung-yi Lee

arXiv:2204.00170·eess.AS·November 1, 2022

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Fan-Lin Wang, Po-chun Hsu, Da-rong Liu, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces Universal Adaptor, a method to convert Mel-spectrograms between different configurations, enabling flexible integration of synthesizers and vocoders in speech synthesis systems without quality loss.

Contribution

We propose a universal adaptor that converts Mel-spectrograms across configurations, facilitating compatibility among diverse speech synthesis components.

Findings

01

Speech quality comparable to ground truth Mel-spectrograms

02

Effective in single- and multi-speaker scenarios

03

Applicable to TTS and voice conversion systems

Abstract

Most recent speech synthesis systems are composed of a synthesizer and a vocoder. However, the existing synthesizers and vocoders can only be matched to acoustic features extracted with a specific configuration. Hence, we can't combine arbitrary synthesizers and vocoders together to form a complete system, not to mention apply to a newly developed model. In this paper, we proposed Universal Adaptor, which takes a Mel-spectrogram parametrized by the source configuration and converts it into a Mel-spectrogram parametrized by the target configuration, as long as we feed in the source and the target configurations. Experiments show that the quality of speeches synthesized from our output of Universal Adaptor is comparable to those synthesized from ground truth Mel-spectrogram no matter in single-speaker or multi-speaker scenarios. Moreover, Universal Adaptor can be applied in the recent TTS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BogiHsu/Universal-Adaptor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems