BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset
Istiaq Ahmed Fahad, Kamruzzaman Asif, Sifat Sikder

TL;DR
This paper introduces BanglaFake, a large Bengali deepfake audio dataset with high-quality synthetic speech, aiming to facilitate research in deepfake detection for low-resource languages.
Contribution
The creation and evaluation of BanglaFake, a novel Bengali deepfake audio dataset with over 25,000 utterances generated using state-of-the-art TTS models.
Findings
High naturalness and intelligibility scores for synthetic speech
t-SNE visualization reveals challenges in differentiating real and fake audio
Dataset addresses resource limitations in Bengali deepfake detection
Abstract
Deepfake audio detection is challenging for low-resource languages like Bengali due to limited datasets and subtle acoustic features. To address this, we introduce BangalFake, a Bengali Deepfake Audio Dataset with 12,260 real and 13,260 deepfake utterances. Synthetic speech is generated using SOTA Text-to-Speech (TTS) models, ensuring high naturalness and quality. We evaluate the dataset through both qualitative and quantitative analyses. Mean Opinion Score (MOS) from 30 native speakers shows Robust-MOS of 3.40 (naturalness) and 4.01 (intelligibility). t-SNE visualization of MFCCs highlights real vs. fake differentiation challenges. This dataset serves as a crucial resource for advancing deepfake detection in Bengali, addressing the limitations of low-resource language research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Hate Speech and Cyberbullying Detection
