Frequency & Channel Attention for Computationally Efficient Sound Event Detection
Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

TL;DR
This paper investigates efficient attention mechanisms in frequency and channel dimensions to improve sound event detection performance while maintaining low computational costs, comparing novel and existing methods.
Contribution
It introduces a lightweight attention approach combining SE and tfwSE modules that achieves comparable performance to more complex models with significantly fewer parameters.
Findings
The combined SE and tfwSE attention method performs similarly to FDY conv with 2.7% more parameters.
Lightweight attention methods can match state-of-the-art SED performance.
Class-wise analysis reveals different attention methods' strengths and characteristics.
Abstract
We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue associated with 2D convolution on the frequency dimension of 2D audio data. Although this approach demonstrated state-of-the-art SED performance, it resulted in a model with 150% more trainable parameters. To achieve comparable SED performance with computationally efficient methods for practicality, we explore on lighter alternative attention methods. In addition, we focus on attention methods applied to frequency and channel dimensions. Joint application Squeeze-and-excitation (SE) module and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
