Cross-scale Attention Model for Acoustic Event Classification

Xugang Lu; Peng Shen; Sheng Li; Yu Tsao; Hisashi Kawai

arXiv:1912.12011·cs.SD·June 17, 2020·1 cites

Cross-scale Attention Model for Acoustic Event Classification

Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

PDF

Open Access

TL;DR

This paper introduces a cross-scale attention model for acoustic event classification that combines features from different scales using attention mechanisms, improving the detection of both short- and long-duration sounds.

Contribution

The paper proposes a novel cross-scale attention model that explicitly integrates multi-scale features with attention weighting, enhancing acoustic event classification performance.

Findings

01

Improved classification accuracy on urban AEC dataset

02

Enhanced detection of short- and long-duration acoustic events

03

Model outperforms existing state-of-the-art methods

Abstract

A major advantage of a deep convolutional neural network (CNN) is that the focused receptive field size is increased by stacking multiple convolutional layers. Accordingly, the model can explore the long-range dependency of features from the top layers. However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation. This limitation is especially evident in the acoustic event classification (AEC) task, where both short- and long-duration events are involved in an audio clip and needed to be classified. In this paper, we propose a cross-scale attention (CSA) model, which explicitly integrates features from different scales to form the final representation. Moreover, we propose the adoption of the attention mechanism to specify the weights of local and global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing