Reducing one-to-many problem in Voice Conversion by equalizing the   formant locations using dynamic frequency warping

Seyed Hamidreza Mohammadi

arXiv:1510.04205·cs.SD·October 15, 2015·1 cites

Reducing one-to-many problem in Voice Conversion by equalizing the formant locations using dynamic frequency warping

Seyed Hamidreza Mohammadi

PDF

Open Access

TL;DR

This paper proposes a method using dynamic frequency warping to equalize formant locations in voice conversion, effectively reducing the one-to-many problem and improving speech quality.

Contribution

It introduces a novel formant equalization technique with dynamic frequency warping to address the one-to-many problem in voice conversion.

Findings

01

Significant improvement in speech quality observed

02

Formant equalization reduces over-smoothing effects

03

Method effectively addresses one-to-many problem

Abstract

In this study, we investigate a solution to reduce the effect of one-to-many problem in voice conversion. One-to-many problem in VC happens when two very similar speech segments in source speaker have corresponding speech segments in target speaker that are not similar to each other. As a result, the mapper function usually over-smoothes the generated features in order to be similar to both target speech segments. In this study, we propose to equalize the formant location of source-target frame pairs using dynamic frequency warping in order to reduce the complexity. After the conversion, another dynamic frequency warping is further applied to reverse the effect of formant location equalization during the training. The subjective experiments showed that the proposed approach improves the speech quality significantly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques