Efficient dynamic filter for robust and low computational feature extraction
Donghyeon Kim, Gwantae Kim, Bokyeung Lee, Jeong-gi Kwak, David K. Han,, Hanseok Ko

TL;DR
This paper introduces an efficient dynamic filter that improves noise robustness and computational efficiency in feature extraction for speech tasks by using chunk-based processing and dynamic attention pooling.
Contribution
The paper proposes a novel dynamic filter with chunk-based feature separation and dynamic attention pooling to enhance performance in unseen noise and speaker environments.
Findings
Outperforms state-of-the-art models in unseen noise scenarios
Improves robustness in speaker verification and keyword spotting
Reduces computational resources compared to previous methods
Abstract
Unseen noise signal which is not considered in a model training process is difficult to anticipate and would lead to performance degradation. Various methods have been investigated to mitigate unseen noise. In our previous work, an Instance-level Dynamic Filter (IDF) and a Pixel Dynamic Filter (PDF) were proposed to extract noise-robust features. However, the performance of the dynamic filter might be degraded since simple feature pooling is used to reduce the computational resource in the IDF part. In this paper, we propose an efficient dynamic filter to enhance the performance of the dynamic filter. Instead of utilizing the simple feature mean, we separate Time-Frequency (T-F) features as non-overlapping chunks, and separable convolutions are carried out for each feature direction (inter chunks and intra chunks). Additionally, we propose Dynamic Attention Pooling that maps high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsAttention Pooling
