Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks
Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng

TL;DR
This paper introduces a channel-wise gated Res2Net model that dynamically suppresses less relevant channels to improve the detection of unseen synthetic speech attacks in speaker verification systems.
Contribution
The work proposes a novel channel-wise gating mechanism integrated into Res2Net, enhancing its ability to generalize to unseen spoofing attacks in anti-spoofing tasks.
Findings
CG-Res2Net outperforms Res2Net on unseen attacks
The method surpasses state-of-the-art single systems
Significant improvement in detection accuracy on ASVspoof 2019 LA
Abstract
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks. The Res2Net approach designs a residual-like connection between feature groups within one block, which increases the possible receptive fields and improves the system's detection generalizability. However, such a residual-like connection is performed by a direct addition between feature groups without channel-wise priority. We argue that the information across channels may not contribute to spoofing cues equally, and the less relevant channels are expected to be suppressed before adding onto the next feature group, so that the system can generalize better to unseen attacks. This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Kaiming Initialization · Convolution · Batch Normalization · Residual Connection · Average Pooling · Res2Net Block · Global Average Pooling · Res2Net
