CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Haibin Wu, Yuan Tseng, Hung-yi Lee

TL;DR
This paper introduces CodecFake, a new dataset of codec-based deepfake audios, revealing that current anti-spoofing models struggle against these, and demonstrating that training on CodecFake improves their detection capabilities.
Contribution
The paper creates the first codec-based deepfake audio dataset and shows how it enhances anti-spoofing models' effectiveness against modern speech synthesis systems.
Findings
Current SOTA anti-spoofing models fail to detect codec-based deepfakes.
The CodecFake dataset improves anti-spoofing detection performance.
Training on CodecFake enhances models' robustness against codec-based speech synthesis.
Abstract
Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can effectively counter deepfake audios from codec-based speech synthesis systems remains unanswered. In this paper, we curate an extensive collection of contemporary SOTA codec models, employing them to re-create synthesized speech. This endeavor leads to the creation of CodecFake, the first codec-based deepfake audio dataset. Additionally, we verify that anti-spoofing models trained on commonly used datasets cannot detect synthesized speech from current codec-based speech generation systems. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
