DNN-based cross-lingual voice conversion using Bottleneck Features

M Kiran Reddy; K Sreenivasa Rao

arXiv:1909.03974·eess.AS·November 12, 2019

DNN-based cross-lingual voice conversion using Bottleneck Features

M Kiran Reddy, K Sreenivasa Rao

PDF

TL;DR

This paper introduces a cross-lingual voice conversion method using bottleneck features and DNNs, enabling speaker-specific voice transformation across languages without source speaker data, outperforming GMM-based methods.

Contribution

The paper presents a novel CLVC framework leveraging bottleneck features from a DAE and DNNs, eliminating the need for source speaker data during training.

Findings

01

Outperforms GMM-based CLVC approach

02

Effective across three Indian languages

03

Captures speaker-specific characteristics accurately

Abstract

Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different languages. A DNN model is trained to learn the mapping between bottleneck features and the corresponding spectral features of the target speaker. The proposed method can capture speaker-specific characteristics of a target speaker, and hence requires no speech data from source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.