Singing Voice Graph Modeling for SingFake Detection
Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

TL;DR
This paper introduces SingGraph, a novel model combining acoustic and linguistic analysis for detecting singing voice deepfakes, achieving state-of-the-art results across various scenarios in the SingFake dataset.
Contribution
The paper presents SingGraph, integrating pitch, rhythm, and lyrics analysis with music domain augmentation techniques to improve SingFake detection performance.
Findings
Achieves 13.2% relative EER reduction for seen singers
Achieves 24.3% relative EER reduction for unseen singers
Achieves 37.1% relative EER reduction for unseen singers with different codecs
Abstract
Detecting singing voice deepfakes, or SingFake, involves determining the authenticity and copyright of a singing voice. Existing models for speech deepfake detection have struggled to adapt to unseen attacks in this unique singing voice domain of human vocalization. To bridge the gap, we present a groundbreaking SingGraph model. The model synergizes the capabilities of the MERT acoustic music understanding model for pitch and rhythm analysis with the wav2vec2.0 model for linguistic analysis of lyrics. Additionally, we advocate for using RawBoost and beat matching techniques grounded in music domain knowledge for singing voice augmentation, thereby enhancing SingFake detection performance. Our proposed method achieves new state-of-the-art (SOTA) results within the SingFake dataset, surpassing the previous SOTA model across three distinct scenarios: it improves EER relatively for seen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
