
TL;DR
This paper introduces GaborNet, a Gabor filter bank-based ingestion layer, integrated into RawNet2 and RawGAT-ST architectures, to improve audio spoof detection, and explores various audio augmentation techniques.
Contribution
It presents GaborNet, a novel Gabor filter bank layer, and evaluates its effectiveness within existing audio spoof detection architectures, along with augmentation strategies.
Findings
GaborNet enhances audio spoof detection performance.
Audio augmentation improves robustness against distortions.
Modifications like squared modulus and Gaussian Lowpass Pooling are effective.
Abstract
An direction of development in the extraction of features from audio signals is based on processing raw samples in the time domain. Such an approach appears to be effective, especially in the era of neural networks. An example is SincNet. In this solution, the core of the neural network layer is a set of sinc functions that are convolved with the input signal. Due to the finite length of sinc functions, distortions appear in the frequency domain of the convolved signal, the same as in the case of windowing the signal. Recently, a new approach has been developed that uses Gabor filters to replace sinc functions. Due to the complex results, further modifications had to be applied, such as squared modulus or Gaussian Lowpass Pooling. In this work, an ingestion layer based on a bank of Gabor filters, named GaborNet, and its modifications are intensively examined within the popular RawNet2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
