Voice ''Cloning'' is Style Transfer
Kaitlyn Zhou, Federico Bianchi, Martijn Bartelds, Anna Pot, Yongchan Kwon, James Zou

TL;DR
Voice cloning technology primarily performs style transfer rather than true voice replication, leading to homogenized speaker traits and increased perceived authority and trustworthiness.
Contribution
This work reveals that voice cloning models function as style transfer systems, highlighting their limitations and potential risks in affecting human perceptions and behavior.
Findings
Cloned voices are perceived as more authoritative and trustworthy.
Voice cloning causes homogenization of speaker characteristics.
Humans prefer cloned voices over source voices in trust and willingness to disclose.
Abstract
Artificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity preservation is important, such as completing a recording, dubbing in a new language, or preserving the voices of individuals with speech loss. However, in our work, we find that despite the term, voice cloning does not faithfully ''clone'' an individual's voice. Instead, we find that widely-used voice cloning models systematically apply style transfer to source voices. As rated by human annotators, cloned voices are perceived as more authoritative, warm, customer-service-like, and human-like compared to their sources. Human annotators also report greater trust in cloned voices than source voices, and a greater willingness to disclose sensitive personal information to them. Our work furthermore shows that voice cloning leads to homogenization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
