Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio
Candice R. Gerstner

TL;DR
This paper introduces 'audio avatar fingerprinting' to verify if synthetic speech is from an authorized user, extending speaker verification models for forensics, and presents a new dataset for this purpose.
Contribution
It pioneers the use of speaker verification models for detecting authorized synthetic speech and creates a new dataset for audio forensics in this context.
Findings
Extended speaker verification models effectively detect synthetic speech.
New dataset enables research on authorized synthetic audio verification.
Demonstrated feasibility of verifying synthetic speech origin.
Abstract
With the advancements in AI speech synthesis, it is easier than ever before to generate realistic audio in a target voice. One only needs a few seconds of reference audio from the target, quite literally putting words in the target person's mouth. This imposes a new set of forensics-related challenges on speech-based authentication systems, videoconferencing, and audio-visual broadcasting platforms, where we want to detect synthetic speech. At the same time, leveraging AI speech synthesis can enhance the different modes of communication through features such as low-bandwidth communication and audio enhancements - leading to ever-increasing legitimate use-cases of synthetic audio. In this case, we want to verify if the synthesized voice is actually spoken by the user. This will require a mechanism to verify whether a given synthetic audio is driven by an authorized identity, or not. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Speech and Audio Processing · Speech Recognition and Synthesis
