Voice Cloning: Comprehensive Survey
Hussam Azzuni, Abdulmotaleb El Saddik

TL;DR
This survey provides a comprehensive overview of voice cloning technologies, standardizes terminology, discusses variations like few-shot and zero-shot, and reviews evaluation metrics and datasets to guide future research and address misuse.
Contribution
It offers a standardized terminology framework and compiles existing algorithms, highlighting key variations and evaluation methods in voice cloning research.
Findings
Standardized voice cloning terminology established
Survey of existing voice cloning algorithms compiled
Evaluation metrics and datasets reviewed
Abstract
Voice Cloning has rapidly advanced in today's digital world, with many researchers and corporations working to improve these algorithms for various applications. This article aims to establish a standardized terminology for voice cloning and explore its different variations. It will cover speaker adaptation as the fundamental concept and then delve deeper into topics such as few-shot, zero-shot, and multilingual TTS within that context. Finally, we will explore the evaluation metrics commonly used in voice cloning research and related datasets. This survey compiles the available voice cloning algorithms to encourage research toward its generation and detection to limit its misuse.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
