Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh, Yamini Sinha, Ingo Siegert, Sebastian Stober

TL;DR
This paper addresses improving voice conversion techniques for dissimilar speakers by incorporating perceptual losses to enhance the naturalness and similarity of converted speech, aiming to better anonymize speaker identity while maintaining intelligibility.
Contribution
It introduces a novel voice conversion method utilizing perceptual losses to improve conversion quality for dissimilar speakers, advancing privacy-preserving speech processing.
Findings
Enhanced voice conversion quality demonstrated in experiments
Improved speaker similarity metrics achieved
Better anonymization of speaker identity
Abstract
The rising trend of using voice as a means of interacting with smart devices has sparked worries over the protection of users' privacy and data security. These concerns have become more pressing, especially after the European Union's adoption of the General Data Protection Regulation (GDPR). The information contained in an utterance encompasses critical personal details about the speaker, such as their age, gender, socio-cultural origins and more. If there is a security breach and the data is compromised, attackers may utilise the speech data to circumvent the speaker verification systems or imitate authorised users. Therefore, it is pertinent to anonymise the speech data before being shared across devices, such that the source speaker of the utterance cannot be traced. Voice conversion (VC) can be used to achieve speech anonymisation, which involves altering the speaker's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
