A Global-local Attention Framework for Weakly Labelled Audio Tagging

Helin Wang; Yuexian Zou; Wenwu Wang

arXiv:2102.01931·eess.AS·February 4, 2021·1 cites

A Global-local Attention Framework for Weakly Labelled Audio Tagging

Helin Wang, Yuexian Zou, Wenwu Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-stream global-local attention framework for weakly labelled audio tagging, enhancing the exploitation of detailed sound event information and improving performance on AudioSet.

Contribution

It proposes a novel two-stream framework that combines global and local analysis with class-wise attention, addressing limitations of previous MIL-based methods.

Findings

01

Significant performance improvement on AudioSet

02

Effective exploitation of local sound event details

03

Compatibility with various baseline architectures

Abstract

Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, where the onset and offset times of the sound events are not provided. Previous works have used the multiple instance learning (MIL) framework, and exploited the information of the whole audio clip by MIL pooling functions. However, the detailed information of sound events such as their durations may not be considered under this framework. To address this issue, we propose a novel two-stream framework for audio tagging by exploiting the global and local information of sound events. The global stream aims to analyze the whole audio clip in order to capture the local clips that need to be attended using a class-wise selection module. These clips are then fed to the local stream to exploit the detailed information for a better decision. Experimental results on the AudioSet show that our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WangHelin1997/GL-AT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization