CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech   Dataset

Xuanjun Chen; Jiawei Du; Haibin Wu; Lin Zhang; I-Ming Lin; I-Hsiang; Chiu; Wenze Ren; Yuan Tseng; Yu Tsao; Jyh-Shing Roger Jang; Hung-yi Lee

arXiv:2501.08238·cs.SD·March 19, 2025

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset

Xuanjun Chen, Jiawei Du, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang, Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee

PDF

Open Access 1 Datasets

TL;DR

This paper introduces CodecFake+, a large-scale dataset for detecting deepfake speech generated by neural audio codecs, along with a taxonomy to analyze codec features and improve detection methods.

Contribution

The paper presents the largest diverse CodecFake dataset and a taxonomy for codec analysis, enabling detailed evaluation and improved detection strategies for neural audio codec-based deepfakes.

Findings

01

Re-synthesized speech (CoRS) improves detection accuracy.

02

Detection is strongest with codecs using disentanglement objectives.

03

Taxonomy-guided data selection enhances detection performance.

Abstract

With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake. Detecting such CodecFake is an urgent challenge, yet most existing systems primarily focus on detecting fake speech generated by traditional speech synthesis models. In this paper, we introduce CodecFake+, a large-scale dataset designed to advance CodecFake detection. To our knowledge, CodecFake+ is the largest dataset encompassing the most diverse range of codec architectures. The training set is generated through re-synthesis using 31 publicly available open-source codec models, while the evaluation set includes web-sourced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CodecFake/CodecFake_Plus_Dataset
dataset· 264 dl
264 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing