CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
Xuanjun Chen, Jiawei Du, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang, Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee

TL;DR
This paper introduces CodecFake+, a large-scale dataset for detecting deepfake speech generated by neural audio codecs, along with a taxonomy to analyze codec features and improve detection methods.
Contribution
The paper presents the largest diverse CodecFake dataset and a taxonomy for codec analysis, enabling detailed evaluation and improved detection strategies for neural audio codec-based deepfakes.
Findings
Re-synthesized speech (CoRS) improves detection accuracy.
Detection is strongest with codecs using disentanglement objectives.
Taxonomy-guided data selection enhances detection performance.
Abstract
With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake. Detecting such CodecFake is an urgent challenge, yet most existing systems primarily focus on detecting fake speech generated by traditional speech synthesis models. In this paper, we introduce CodecFake+, a large-scale dataset designed to advance CodecFake detection. To our knowledge, CodecFake+ is the largest dataset encompassing the most diverse range of codec architectures. The training set is generated through re-synthesis using 31 publicly available open-source codec models, while the evaluation set includes web-sourced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
