M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
Yupei Li, Hanqian Li, Lucia Specia, Bj\"orn W. Schuller

TL;DR
The paper introduces M6, a comprehensive, diverse dataset for machine-generated music detection, addressing a critical need for robust benchmarks to improve detection methods and protect the value of human compositions.
Contribution
M6 is the first large-scale, multi-dimensional dataset for MGMD, covering multiple generators, domains, languages, cultures, genres, and instruments, facilitating advanced research.
Findings
Baseline models show significant room for improvement.
Diversity of M6 enhances robustness of MGMD research.
Data analysis reveals complexity in distinguishing MG from human music.
Abstract
Machine-generated music (MGM) has emerged as a powerful tool with applications in music therapy, personalised editing, and creative inspiration for the music community. However, its unregulated use threatens the entertainment, education, and arts sectors by diminishing the value of high-quality human compositions. Detecting machine-generated music (MGMD) is, therefore, critical to safeguarding these domains, yet the field lacks comprehensive datasets to support meaningful progress. To address this gap, we introduce \textbf{M6}, a large-scale benchmark dataset tailored for MGMD research. M6 is distinguished by its diversity, encompassing multiple generators, domains, languages, cultural contexts, genres, and instruments. We outline our methodology for data selection and collection, accompanied by detailed data analysis, providing all WAV form of music. Additionally, we provide baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
