A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification
You Wang, Chuyao Feng, David V. Anderson

TL;DR
This paper introduces a multi-channel temporal attention convolutional neural network that enhances environmental sound classification by effectively capturing channel-specific temporal features, outperforming existing models on standard datasets.
Contribution
The paper proposes a novel multi-channel temporal attention block within CNNs, enabling better exploitation of temporal information across channels for sound classification.
Findings
MCTA outperforms single-channel and non-attention models.
Achieves competitive results with lighter networks.
Effective on multiple environmental sound datasets.
Abstract
Recently, many attention-based deep neural networks have emerged and achieved state-of-the-art performance in environmental sound classification. The essence of attention mechanism is assigning contribution weights on different parts of features, namely channels, spectral or spatial contents, and temporal frames. In this paper, we propose an effective convolutional neural network structure with a multi-channel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information. This multi-channel temporal attention structure will result in a distinct attention vector for each channel, which enables the network to fully exploit the relevant temporal information in different channels. The datasets used to test our model include ESC-50 and its subset ESC-10, along with development…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior
