Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
Yongqiang Dou, Haocheng Yang, Maolin Yang, Yanyan Xu, Dengfeng Ke

TL;DR
This paper introduces D3M, a novel training approach using balanced focal loss to improve anti-spoofing in speaker verification, especially for indistinguishable samples, achieving state-of-the-art results on the ASVspoof2019 dataset.
Contribution
It proposes a balanced focal loss function for training anti-spoofing models, addressing data discrepancy issues and enhancing detection of challenging samples.
Findings
Balanced focal loss outperforms cross-entropy loss in anti-spoofing tasks.
Fusion of three feature types surpasses more complex models in performance.
Method maintains effectiveness on real replay data, indicating robustness.
Abstract
It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsFocal Loss
