MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating   Mechanism for Accelerating Online Computation

Yu-Tao Chang; Yuan-Hong Yang; Yu-Huai Peng; Syu-Siang Wang; Tai-Shih; Chi; Yu Tsao; Hsin-Min Wang

arXiv:1912.11984·cs.SD·December 30, 2019

MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation

Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, Tai-Shih, Chi, Yu Tsao, Hsin-Min Wang

PDF

Open Access

TL;DR

This paper introduces MoEVC, a mixture-of-experts voice conversion system with a sparse gating mechanism that significantly reduces computational load and enhances conversion quality, enabling faster online processing.

Contribution

The study presents a novel MoE-based VC system with sparse gating, improving online efficiency and performance over existing methods.

Findings

01

70% reduction in FLOPs achieved

02

Enhanced voice conversion quality in evaluations

03

Effective acceleration of online computation

Abstract

With the recent advancements of deep learning technologies, the performance of voice conversion (VC) in terms of quality and similarity has been significantly improved. However, heavy computations are generally required for deep-learning-based VC systems, which can cause notable latency and thus confine their deployments in real-world applications. Therefore, increasing online computation efficiency has become an important task. In this study, we propose a novel mixture-of-experts (MoE) based VC system. The MoE model uses a gating mechanism to specify optimal weights to feature maps to increase VC performance. In addition, assigning sparse constraints on the gating mechanism can accelerate online computation by skipping the convolution process by zeroing out redundant feature maps. Experimental results show that by specifying suitable sparse constraints, we can effectively increase the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing