Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Arun Balaji Buduru

TL;DR
This paper introduces a large-scale Indic speech dataset and a novel hyperbolic ALM, SATYAM, to improve detection of neural audio codec synthesized speech deepfakes in Indic languages, addressing language-specific challenges.
Contribution
The paper presents the first large-scale Indic speech benchmark and a new hyperbolic ALM, SATYAM, specifically designed for deepfake detection in Indic languages, outperforming existing models.
Findings
State-of-the-art CF detectors fail to generalize to Indic languages.
Current ALMs perform poorly on Indic deepfake detection in zero-shot settings.
SATYAM outperforms existing baselines on the ICF benchmark.
Abstract
The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic speech deepfakes, commonly referred to as CodecFakes (CFs). Consequently, CF detection has attracted increasing attention from the research community. However, existing studies predominantly focus on English or Chinese, leaving the vulnerability of Indic languages largely unexplored. To bridge this gap, we introduce Indic-CodecFake (ICF) dataset, the first large-scale benchmark comprising real and NAC-synthesized speech across multiple Indic languages, diverse speaker profiles, and multiple NAC types. We use IndicSUPERB as the real speech corpus for generation of ICF dataset. Our experiments demonstrate that state-of-the-art (SOTA) CF detectors trained on English-centric datasets fail to generalize to ICF, underscoring the challenges posed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
