DNN-based cross-lingual voice conversion using Bottleneck Features
M Kiran Reddy, K Sreenivasa Rao

TL;DR
This paper introduces a cross-lingual voice conversion method using bottleneck features and DNNs, enabling speaker-specific voice transformation across languages without source speaker data, outperforming GMM-based methods.
Contribution
The paper presents a novel CLVC framework leveraging bottleneck features from a DAE and DNNs, eliminating the need for source speaker data during training.
Findings
Outperforms GMM-based CLVC approach
Effective across three Indian languages
Captures speaker-specific characteristics accurately
Abstract
Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different languages. A DNN model is trained to learn the mapping between bottleneck features and the corresponding spectral features of the target speaker. The proposed method can capture speaker-specific characteristics of a target speaker, and hence requires no speech data from source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
