Voice Cloning: Comprehensive Survey

Hussam Azzuni; Abdulmotaleb El Saddik

arXiv:2505.00579·cs.SD·May 2, 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni, Abdulmotaleb El Saddik

PDF

TL;DR

This survey provides a comprehensive overview of voice cloning technologies, standardizes terminology, discusses variations like few-shot and zero-shot, and reviews evaluation metrics and datasets to guide future research and address misuse.

Contribution

It offers a standardized terminology framework and compiles existing algorithms, highlighting key variations and evaluation methods in voice cloning research.

Findings

01

Standardized voice cloning terminology established

02

Survey of existing voice cloning algorithms compiled

03

Evaluation metrics and datasets reviewed

Abstract

Voice Cloning has rapidly advanced in today's digital world, with many researchers and corporations working to improve these algorithms for various applications. This article aims to establish a standardized terminology for voice cloning and explore its different variations. It will cover speaker adaptation as the fundamental concept and then delve deeper into topics such as few-shot, zero-shot, and multilingual TTS within that context. Finally, we will explore the evaluation metrics commonly used in voice cloning research and related datasets. This survey compiles the available voice cloning algorithms to encourage research toward its generation and detection to limit its misuse.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.