SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker,, Bishmoy Paul, Shaikh Anowarul Fattah

TL;DR
This paper introduces SONICS, a large dataset for detecting AI-generated synthetic songs, emphasizing the importance of modeling long-range dependencies with a new efficient architecture, SpecTTTra, to improve detection accuracy and efficiency.
Contribution
The paper presents SONICS, a comprehensive dataset for end-to-end fake song detection, and proposes SpecTTTra, an innovative architecture that enhances long-range temporal modeling with better efficiency.
Findings
SpecTTTra outperforms ViT by 8% in F1 score on long songs.
SpecTTTra is 38% faster and uses 26% less memory than CNN and Transformer models.
The SONICS dataset includes over 97k songs, addressing previous dataset limitations.
Abstract
The recent surge in AI-generated songs presents exciting possibilities and challenges. These innovations necessitate the ability to distinguish between human-composed and synthetic songs to safeguard artistic integrity and protect human musical artistry. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, these approaches are inadequate for detecting contemporary end-to-end artificial songs where all components (vocals, music, lyrics, and style) could be AI-generated. Additionally, existing datasets lack music-lyrics diversity, long-duration songs, and open-access fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs (4,751 hours) with over 49k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗awsaf49/sonics-spectttra-gamma-5smodel· 797 dl797 dl
- 🤗awsaf49/sonics-spectttra-alpha-5smodel· 264 dl264 dl
- 🤗awsaf49/sonics-spectttra-alpha-120smodel· 9.8k dl9.8k dl
- 🤗awsaf49/sonics-spectttra-beta-5smodel· 70 dl70 dl
- 🤗awsaf49/sonics-spectttra-beta-120smodel· 76 dl76 dl
- 🤗awsaf49/sonics-spectttra-gamma-120smodel· 718 dl718 dl
Videos
Taxonomy
TopicsDiverse Musicological Studies · Music History and Culture · Music and Audio Processing
MethodsConvNeXt · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
