Efficient dynamic filter for robust and low computational feature   extraction

Donghyeon Kim; Gwantae Kim; Bokyeung Lee; Jeong-gi Kwak; David K. Han,; Hanseok Ko

arXiv:2205.01304·eess.AS·October 24, 2022

Efficient dynamic filter for robust and low computational feature extraction

Donghyeon Kim, Gwantae Kim, Bokyeung Lee, Jeong-gi Kwak, David K. Han,, Hanseok Ko

PDF

Open Access

TL;DR

This paper introduces an efficient dynamic filter that improves noise robustness and computational efficiency in feature extraction for speech tasks by using chunk-based processing and dynamic attention pooling.

Contribution

The paper proposes a novel dynamic filter with chunk-based feature separation and dynamic attention pooling to enhance performance in unseen noise and speaker environments.

Findings

01

Outperforms state-of-the-art models in unseen noise scenarios

02

Improves robustness in speaker verification and keyword spotting

03

Reduces computational resources compared to previous methods

Abstract

Unseen noise signal which is not considered in a model training process is difficult to anticipate and would lead to performance degradation. Various methods have been investigated to mitigate unseen noise. In our previous work, an Instance-level Dynamic Filter (IDF) and a Pixel Dynamic Filter (PDF) were proposed to extract noise-robust features. However, the performance of the dynamic filter might be degraded since simple feature pooling is used to reduce the computational resource in the IDF part. In this paper, we propose an efficient dynamic filter to enhance the performance of the dynamic filter. Instead of utilizing the simple feature mean, we separate Time-Frequency (T-F) features as non-overlapping chunks, and separable convolutions are carried out for each feature direction (inter chunks and intra chunks). Additionally, we propose Dynamic Attention Pooling that maps high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsAttention Pooling