Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse   based Sampling and Training Approach

Mohammed Yousif; Jonat John Mathew; Huzaifa Pallan; Agamjeet Singh; Padda; Syed Daniyal Shah; Sara Adamski; Madhu Reddiboina; Arjun Pankajakshan

arXiv:2404.13008·cs.SD·April 22, 2024·1 cites

Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh, Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

PDF

Open Access

TL;DR

This paper introduces a neural collapse-based sampling method that improves generalization in audio deepfake detection, achieving comparable results with less training data and computational cost.

Contribution

It proposes a novel sampling approach leveraging neural collapse to enhance generalization in deepfake detection using pre-trained models.

Findings

01

Comparable generalization on unseen data

02

Reduced training data requirements

03

Maintained detection performance

Abstract

Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing