Acoustic Scene Classification Using Fusion of Attentive Convolutional   Neural Networks for DCASE2019 Challenge

Hossein Zeinali; Luk\'a\v{s} Burget; Jan "Honza'' \v{C}ernock\'y

arXiv:1907.07127·eess.AS·July 17, 2019·6 cites

Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

Hossein Zeinali, Luk\'a\v{s} Burget, Jan "Honza'' \v{C}ernock\'y

PDF

Open Access

TL;DR

This paper presents a fusion of three attentive CNN architectures for acoustic scene classification, demonstrating improved performance on the DCASE2019 challenge dataset through multi-model fusion and self-attention mechanisms.

Contribution

It introduces a novel fusion approach combining VGG-like, Light-CNN, and x-vector CNNs with self-attention for enhanced acoustic scene classification.

Findings

01

Fusion of multiple CNNs improves classification accuracy.

02

Self-attention mechanisms enhance feature pooling.

03

The approach achieves competitive results in DCASE2019 challenge.

Abstract

In this report, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2019 challenge are described. Also, the analysis of different methods is provided. The proposed approach is a fusion of three different Convolutional Neural Network (CNN) topologies. The first one is a VGG like two-dimensional CNNs. The second one is again a two-dimensional CNN network which uses Max-Feature-Map activation and called Light-CNN (LCNN). The third network is a one-dimensional CNN which mainly used for speaker verification and called x-vector topology. All proposed networks use self-attention mechanism for statistic pooling. As a feature, we use a 256-dimensional log Mel-spectrogram. Our submissions are a fusion of several networks trained on 4-folds generated evaluation setup using different fusion strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis