
TL;DR
This paper presents a method for detecting AI-generated music deepfakes using spectrogram-based CNN classifiers, addressing ethical concerns and emphasizing the importance of detection systems for protecting artists.
Contribution
It introduces a CNN-based approach trained on modified audio data to identify deepfake music, highlighting the need for ethical safeguards in TTM platforms.
Findings
CNN classifier achieves high accuracy on modified datasets
Tempo stretching and pitch shifting simulate real-world adversarial conditions
Detection systems are crucial for protecting artists and ensuring ethical use
Abstract
The proliferation of Text-to-Music (TTM) platforms has democratized music creation, enabling users to effortlessly generate high-quality compositions. However, this innovation also presents new challenges to musicians and the broader music industry. This study investigates the detection of AI-generated songs using the FakeMusicCaps dataset by classifying audio as either deepfake or human. To simulate real-world adversarial conditions, tempo stretching and pitch shifting were applied to the dataset. Mel spectrograms were generated from the modified audio, then used to train and evaluate a convolutional neural network. In addition to presenting technical results, this work explores the ethical and societal implications of TTM platforms, arguing that carefully designed detection systems are essential to both protecting artists and unlocking the positive potential of generative AI in music.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
