An Attribute Interpolation Method in Speech Synthesis by Model Merging

Masato Murata; Koichi Miyazaki; Tomoki Koriyama

arXiv:2407.00766·cs.SD·July 2, 2024

An Attribute Interpolation Method in Speech Synthesis by Model Merging

Masato Murata, Koichi Miyazaki, Tomoki Koriyama

PDF

Open Access

TL;DR

This paper introduces a simple and effective attribute interpolation method in speech synthesis by merging trained models, enabling smooth control over speaker and emotion attributes without additional training.

Contribution

The paper proposes a novel model merging technique for attribute interpolation in speech synthesis that does not require specialized modules or retraining.

Findings

01

Achieved smooth attribute interpolation in speaker generation.

02

Successfully controlled emotion intensity through model merging.

03

Maintained linguistic content during attribute interpolation.

Abstract

With the development of speech synthesis, recent research has focused on challenging tasks, such as speaker generation and emotion intensity control. Attribute interpolation is a common approach to these tasks. However, most previous methods for attribute interpolation require specific modules or training methods. We propose an attribute interpolation method in speech synthesis by model merging. Model merging is a method that creates new parameters by only averaging the parameters of base models. The merged model can generate an output with an intermediate feature of the base models. This method is easily applicable without specific modules or training methods, as it uses only existing trained base models. We merged two text-to-speech models to achieve attribute interpolation and evaluated its performance on speaker generation and emotion intensity control tasks. As a result, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis