Attentive max feature map and joint training for acoustic scene   classification

Hye-jin Shim; Jee-weon Jung; Ju-ho Kim; Ha-Jin Yu

arXiv:2104.07213·cs.LG·December 24, 2021

Attentive max feature map and joint training for acoustic scene classification

Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-Jin Yu

PDF

Open Access

TL;DR

This paper introduces an attentive max feature map and joint training methods to improve acoustic scene classification, achieving state-of-the-art results with fewer parameters and demonstrating effectiveness in DCASE challenges.

Contribution

The paper proposes a novel attentive max feature map and explores joint training techniques to enhance acoustic scene classification performance.

Findings

01

Achieved state-of-the-art performance on DCASE 2020 Subtask A

02

Placed fourth in DCASE 2021 challenge using proposed methods

03

Demonstrated effectiveness with fewer parameters

Abstract

Various attention mechanisms are being widely applied to acoustic scene classification. However, we empirically found that the attention mechanism can excessively discard potentially valuable information, despite improving performance. We propose the attentive max feature map that combines two effective techniques, attention and a max feature map, to further elaborate the attention mechanism and mitigate the above-mentioned phenomenon. We also explore various joint training methods, including multi-task learning, that allocate additional abstract labels for each audio recording. Our proposed system demonstrates state-of-the-art performance for single systems on Subtask A of the DCASE 2020 challenge by applying the two proposed techniques using relatively fewer parameters. Furthermore, adopting the proposed attentive max feature map, our team placed fourth in the recent DCASE 2021…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis