CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from   Codec-Based Speech Synthesis Systems

Haibin Wu; Yuan Tseng; Hung-yi Lee

arXiv:2406.07237·eess.AS·June 12, 2024·1 cites

CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

Haibin Wu, Yuan Tseng, Hung-yi Lee

PDF

Open Access 2 Datasets

TL;DR

This paper introduces CodecFake, a new dataset of codec-based deepfake audios, revealing that current anti-spoofing models struggle against these, and demonstrating that training on CodecFake improves their detection capabilities.

Contribution

The paper creates the first codec-based deepfake audio dataset and shows how it enhances anti-spoofing models' effectiveness against modern speech synthesis systems.

Findings

01

Current SOTA anti-spoofing models fail to detect codec-based deepfakes.

02

The CodecFake dataset improves anti-spoofing detection performance.

03

Training on CodecFake enhances models' robustness against codec-based speech synthesis.

Abstract

Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can effectively counter deepfake audios from codec-based speech synthesis systems remains unanswered. In this paper, we curate an extensive collection of contemporary SOTA codec models, employing them to re-create synthesized speech. This endeavor leads to the creation of CodecFake, the first codec-based deepfake audio dataset. Additionally, we verify that anti-spoofing models trained on commonly used datasets cannot detect synthesized speech from current codec-based speech generation systems. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing