RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang,, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

TL;DR
RawBMamba is an end-to-end bidirectional state space model that effectively captures both short- and long-range features for audio deepfake detection, significantly improving performance over previous models.
Contribution
The paper introduces RawBMamba, a novel bidirectional state space model that combines local and global features for improved audio deepfake detection.
Findings
34.1% improvement over Rawformer on ASVspoof2021 LA dataset
Effective integration of short- and long-range features
Competitive performance on multiple datasets
Abstract
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
