Towards Neural Audio Codec Source Parsing

Orchid Chetia Phukan; Girish; Mohd Mujtaba Akhtar; Arun Balaji Buduru; Rajesh Sharma

arXiv:2506.12627·eess.AS·June 17, 2025

Towards Neural Audio Codec Source Parsing

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Arun Balaji Buduru, Rajesh Sharma

PDF

Open Access

TL;DR

This paper introduces NACSP, a novel approach for source attribution of neural audio codecs by predicting their internal parameters, using a hyperbolic geometry-based framework called HYDRA, leading to improved generalization and interpretability.

Contribution

It proposes NACSP as a structured regression method for source attribution and introduces HYDRA, a hyperbolic geometry framework, to enhance multi-task learning for neural audio codecs.

Findings

01

HYDRA outperforms Euclidean baselines on CFs datasets.

02

NACSP effectively predicts NAC parameters for source attribution.

03

Hyperbolic geometry improves multi-task generalization.

Abstract

A new class of audio deepfakes-codecfakes (CFs)-has recently caught attention, synthesized by Audio Language Models that leverage neural audio codecs (NACs) in the backend. In response, the community has introduced dedicated benchmarks and tailored detection strategies. As the field advances, efforts have moved beyond binary detection toward source attribution, including open-set attribution, which aims to identify the NAC responsible for generation and flag novel, unseen ones during inference. This shift toward source attribution improves forensic interpretability and accountability. However, open-set attribution remains fundamentally limited: while it can detect that a NAC is unfamiliar, it cannot characterize or identify individual unseen codecs. It treats such inputs as generic ``unknowns'', lacking insight into their internal configuration. This leads to major shortcomings: limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques