TL;DR
This paper introduces the Neural Codec Source Tracing (NCST) task for open-set neural codec classification, along with a new dataset and benchmark, highlighting challenges in classifying unseen real audio.
Contribution
It defines the NCST task, creates the ST-Codecfake dataset, and establishes a benchmark for open-set neural codec source tracing, addressing a gap in current research.
Findings
Models perform well in in-distribution classification
Models detect out-of-distribution samples effectively
Robustness drops when classifying unseen real audio
Abstract
Current research in audio deepfake detection is gradually transitioning from binary classification to multi-class tasks, referred as audio deepfake source tracing task. However, existing studies on source tracing consider only closed-set scenarios and have not considered the challenges posed by open-set conditions. In this paper, we define the Neural Codec Source Tracing (NCST) task, which is capable of performing open-set neural codec classification and interpretable ALM detection. Specifically, we constructed the ST-Codecfake dataset for the NCST task, which includes bilingual audio samples generated by 11 state-of-the-art neural codec methods and ALM-based out-ofdistribution (OOD) test samples. Furthermore, we establish a comprehensive source tracing benchmark to assess NCST models in open-set conditions. The experimental results reveal that although the NCST models perform well in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
