Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning

Arnesh Batra; Dev Sharma; Krish Thukral; Ruhani Bhatia; Naman Batra; Aditya Gautam

arXiv:2512.00621·cs.SD·December 2, 2025

Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning

Arnesh Batra, Dev Sharma, Krish Thukral, Ruhani Bhatia, Naman Batra, Aditya Gautam

PDF

Open Access

TL;DR

This paper introduces a large-scale, diverse benchmark dataset for synthetic music detection and proposes a novel dual-stream contrastive learning architecture, CLAM, which significantly improves detection accuracy and generalization to out-of-distribution synthetic music.

Contribution

The paper presents MoM, a comprehensive benchmark dataset, and CLAM, a new dual-stream contrastive learning model, advancing the robustness and accuracy of synthetic music detection.

Findings

01

CLAM achieves an F1 score of 0.925 on the MoM benchmark.

02

MoM is the most diverse synthetic music dataset to date.

03

CLAM outperforms previous models in generalization to out-of-distribution synthetic content.

Abstract

The rapid evolution of end-to-end AI music generation poses an escalating threat to artistic authenticity and copyright, demanding detection methods that can keep pace. While foundational, existing models like SpecTTTra falter when faced with the diverse and rapidly advancing ecosystem of new generators, exhibiting significant performance drops on out-of-distribution (OOD) content. This generalization failure highlights a critical gap: the need for more challenging benchmarks and more robust detection architectures. To address this, we first introduce Melody or Machine (MoM), a new large-scale benchmark of over 130,000 songs (6,665 hours). MoM is the most diverse dataset to date, built with a mix of open and closed-source models and a curated OOD test set designed specifically to foster the development of truly generalizable detectors. Alongside this benchmark, we introduce CLAM, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis