Singing Voice Graph Modeling for SingFake Detection

Xuanjun Chen; Haibin Wu; Jyh-Shing Roger Jang; Hung-yi Lee

arXiv:2406.03111·eess.AS·June 4, 2025

Singing Voice Graph Modeling for SingFake Detection

Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces SingGraph, a novel model combining acoustic and linguistic analysis for detecting singing voice deepfakes, achieving state-of-the-art results across various scenarios in the SingFake dataset.

Contribution

The paper presents SingGraph, integrating pitch, rhythm, and lyrics analysis with music domain augmentation techniques to improve SingFake detection performance.

Findings

01

Achieves 13.2% relative EER reduction for seen singers

02

Achieves 24.3% relative EER reduction for unseen singers

03

Achieves 37.1% relative EER reduction for unseen singers with different codecs

Abstract

Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice. Existing models for speech deepfake detection have struggled to adapt to unseen attacks in this unique singing voice domain of human vocalization. To bridge the gap, we present a groundbreaking SingGraph model. The model synergizes the capabilities of the MERT acoustic music understanding model for pitch and rhythm analysis with the wav2vec2.0 model for linguistic analysis of lyrics. Additionally, we advocate for using RawBoost and beat matching techniques grounded in music domain knowledge for singing voice augmentation, thereby enhancing SingFake detection performance. Our proposed method achieves new state-of-the-art (SOTA) results within the SingFake dataset, surpassing the previous SOTA model across three distinct scenarios: it improves EER relatively for seen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xjchengit/singgraph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing