Multi-level Attention Model for Weakly Supervised Audio Classification

Changsong Yu; Karim Said Barsim; Qiuqiang Kong; Bin Yang

arXiv:1803.02353·eess.AS·March 8, 2018·63 cites

Multi-level Attention Model for Weakly Supervised Audio Classification

Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang

PDF

Open Access 5 Repos

TL;DR

This paper introduces a multi-level attention model for weakly supervised audio classification, leveraging multiple attention modules at different neural network layers to improve prediction accuracy on large-scale datasets.

Contribution

The paper presents an extension of the single-level attention model by incorporating multiple attention modules at various neural network layers for better audio event detection.

Findings

01

Achieved a mean average precision of 0.360 on Audio Set

02

Outperformed previous single-level attention model (0.327)

03

Surpassed Google baseline (0.314)

Abstract

In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection