How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

Yixuan Xiao; Florian Lux; Alejandro P\'erez-Gonz\'alez-de-Martos; Ngoc Thang Vu

arXiv:2602.16343·cs.SD·February 19, 2026

How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

Yixuan Xiao, Florian Lux, Alejandro P\'erez-Gonz\'alez-de-Martos, Ngoc Thang Vu

PDF

Open Access

TL;DR

This paper investigates how neural audio codecs' dual role in compressing and synthesizing speech impacts the effectiveness of audio deepfake detection, highlighting the importance of labeling strategies.

Contribution

It introduces a new dataset extension and analyzes the effect of labeling choices on deepfake detection performance.

Findings

01

Labeling strategies significantly influence detection accuracy.

02

Neural codecs' dual functionality complicates spoof detection.

03

Insights provided for improved labeling in deepfake detection tasks.

Abstract

Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis