Echoes: A semantically-aligned music deepfake detection dataset

Octavian Pascu; Dan Oneata; Horia Cucu; Nicolas M. Muller

arXiv:2603.23667·cs.SD·March 26, 2026

Echoes: A semantically-aligned music deepfake detection dataset

Octavian Pascu, Dan Oneata, Horia Cucu, Nicolas M. Muller

PDF

Open Access 1 Datasets

TL;DR

Echoes is a challenging new dataset for music deepfake detection that emphasizes semantic alignment and diversity, leading to improved generalization of detection models across different datasets.

Contribution

The paper introduces Echoes, a large, diverse, and semantically-aligned music deepfake dataset designed to enhance training and benchmarking of detection methods.

Findings

01

Echoes is the most challenging in-domain dataset.

02

Detectors trained on existing datasets perform poorly on Echoes.

03

Training on Echoes improves generalization to other datasets.

Abstract

We introduce Echoes, a new dataset for music deepfake detection designed for training and benchmarking detectors under realistic and provider-diverse conditions. Echoes comprises 3,577 tracks (110 hours of audio) spanning multiple genres (pop, rock, electronic), and includes content generated by ten popular AI music generation systems. To prevent shortcut learning and promote robust generalization, the dataset is deliberately constructed to be challenging, enforcing semantic-level alignment between spoofed audio and bona fide references. This alignment is achieved by conditioning generated audio samples directly on bona-fide waveforms or song descriptors. We evaluate Echoes in a cross-dataset setting against three existing AI-generated music datasets using state-of-the-art Wav2Vec2 XLS-R 2B representations. Results show that (i) Echoes is the hardest in-domain dataset; (ii) detectors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Octavian97/Echoes
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies