Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang

TL;DR
This paper introduces a novel dictionary update method for NMF-based voice conversion that leverages an encoder-decoder framework, enabling effective use of high-dimensional spectral data and improving speech conversion quality with smaller dictionaries.
Contribution
The paper presents a new dictionary update approach for NMF in voice conversion, reformulating NMF as an encoder-decoder network to enhance efficiency and effectiveness.
Findings
Significant improvements with small dictionaries over traditional methods.
Effective exploitation of high-dimensional spectral features.
Enhanced speech conversion quality with reduced dictionary size.
Abstract
In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task. Voice conversion has been widely studied due to its potential applications such as personalized speech synthesis and speech enhancement. Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest choice among all techniques for SC, as long as a source-target parallel speech corpus is given. ENMF-based SC systems usually need a large amount of bases (exemplars) to ensure the quality of the converted speech. However, a small and effective dictionary is desirable but hard to obtain via dictionary update, in particular when high-dimensional features such as STRAIGHT spectra are used. Therefore, we propose a dictionary update framework for NMF by means of an encoder-decoder reformulation. Regarding NMF as an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
