Dictionary Update for NMF-based Voice Conversion Using an   Encoder-Decoder Network

Chin-Cheng Hsu; Hsin-Te Hwang; Yi-Chiao Wu; Yu Tsao; and Hsin-Min Wang

arXiv:1610.03988·stat.ML·October 14, 2016

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang

PDF

Open Access

TL;DR

This paper introduces a novel dictionary update method for NMF-based voice conversion that leverages an encoder-decoder framework, enabling effective use of high-dimensional spectral data and improving speech conversion quality with smaller dictionaries.

Contribution

The paper presents a new dictionary update approach for NMF in voice conversion, reformulating NMF as an encoder-decoder network to enhance efficiency and effectiveness.

Findings

01

Significant improvements with small dictionaries over traditional methods.

02

Effective exploitation of high-dimensional spectral features.

03

Enhanced speech conversion quality with reduced dictionary size.

Abstract

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task. Voice conversion has been widely studied due to its potential applications such as personalized speech synthesis and speech enhancement. Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest choice among all techniques for SC, as long as a source-target parallel speech corpus is given. ENMF-based SC systems usually need a large amount of bases (exemplars) to ensure the quality of the converted speech. However, a small and effective dictionary is desirable but hard to obtain via dictionary update, in particular when high-dimensional features such as STRAIGHT spectra are used. Therefore, we propose a dictionary update framework for NMF by means of an encoder-decoder reformulation. Regarding NMF as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques