Evaluating Fake Music Detection Performance Under Audio Augmentations
Tomasz Sroka, Tomasz W\k{e}\.zowicz, Dominik Sidorczuk, Mateusz Modrzejewski

TL;DR
This paper investigates how audio augmentations impact the robustness of fake music detection models, revealing significant performance drops even with light transformations, thus highlighting challenges in model generalization.
Contribution
The study provides a comprehensive evaluation of fake music detection robustness under various audio augmentations using a newly constructed dataset.
Findings
Model performance decreases significantly with audio augmentations.
Light augmentations can substantially impair detection accuracy.
The dataset enables better assessment of model generalization.
Abstract
With the rapid advancement of generative audio models, distinguishing between human-composed and generated music is becoming increasingly challenging. As a response, models for detecting fake music have been proposed. In this work, we explore the robustness of such systems under audio augmentations. To evaluate model generalization, we constructed a dataset consisting of both real and synthetic music generated using several systems. We then apply a range of audio transformations and analyze how they affect classification accuracy. We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations. The performance of the model decreases significantly even with the introduction of light augmentations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Music Technology and Sound Studies
