A Multi-Channel Temporal Attention Convolutional Neural Network Model   for Environmental Sound Classification

You Wang; Chuyao Feng; David V. Anderson

arXiv:2011.02561·eess.AS·November 6, 2020

A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

You Wang, Chuyao Feng, David V. Anderson

PDF

Open Access

TL;DR

This paper introduces a multi-channel temporal attention convolutional neural network that enhances environmental sound classification by effectively capturing channel-specific temporal features, outperforming existing models on standard datasets.

Contribution

The paper proposes a novel multi-channel temporal attention block within CNNs, enabling better exploitation of temporal information across channels for sound classification.

Findings

01

MCTA outperforms single-channel and non-attention models.

02

Achieves competitive results with lighter networks.

03

Effective on multiple environmental sound datasets.

Abstract

Recently, many attention-based deep neural networks have emerged and achieved state-of-the-art performance in environmental sound classification. The essence of attention mechanism is assigning contribution weights on different parts of features, namely channels, spectral or spatial contents, and temporal frames. In this paper, we propose an effective convolutional neural network structure with a multi-channel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information. This multi-channel temporal attention structure will result in a distinct attention vector for each channel, which enables the network to fully exploit the relevant temporal information in different channels. The datasets used to test our model include ESC-50 and its subset ESC-10, along with development…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior