Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva, Aleksei Gusev

TL;DR
This paper introduces SpeakerVC, a lightweight zero-shot voice conversion model capable of converting both voiced and whispered speech with high speaker similarity, even in streaming mode, addressing gaps in domain-specific and whispered speech conversion.
Contribution
The paper presents a novel zero-shot voice conversion model that effectively handles whispered and voiced speech, improving speaker similarity and enabling streaming mode operation.
Findings
High speaker similarity in generated speech
Effective zero-shot conversion for whispered speech
Streaming mode operation without quality loss
Abstract
Zero-shot voice conversion aims to transfer the voice of a source speaker to that of a speaker unseen during training, while preserving the content information. Although various methods have been proposed to reconstruct speaker information in generated speech, there is still room for improvement in achieving high similarity between generated and ground truth recordings. Furthermore, zero-shot voice conversion for speech in specific domains, such as whispered, remains an unexplored area. To address this problem, we propose a SpeakerVC model that can effectively perform zero-shot speech conversion in both voiced and whispered domains, while being lightweight and capable of running in streaming mode without significant quality degradation. In addition, we explore methods to improve the quality of speaker identity transfer and demonstrate their effectiveness for a variety of voice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
