Improving weakly supervised sound event detection with self-supervised   auxiliary tasks

Soham Deshmukh; Bhiksha Raj; Rita Singh

arXiv:2106.06858·eess.AS·June 15, 2021

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

Soham Deshmukh, Bhiksha Raj, Rita Singh

PDF

1 Repo

TL;DR

This paper introduces a novel shared encoder architecture with self-supervised auxiliary tasks and a two-step attention pooling mechanism to enhance weakly supervised sound event detection in noisy, low-data environments without pretraining.

Contribution

It proposes a new framework combining self-supervised auxiliary tasks and a two-step attention pooling for improved sound event detection without pretraining.

Findings

01

Outperforms benchmarks by up to 22.3% in noisy conditions

02

Effective in low SNR scenarios (0, 10, 20 dB)

03

Ablation confirms the auxiliary task and attention pooling benefits

Abstract

While multitask and transfer learning has shown to improve the performance of neural networks in limited data settings, they require pretraining of the model on large datasets beforehand. In this paper, we focus on improving the performance of weakly supervised sound event detection in low data and noisy settings simultaneously without requiring any pretraining task. To that extent, we propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task. We empirically evaluate the proposed framework for weakly supervised sound event detection on a remix dataset of the DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data under 0, 10 and 20 dB SNR. To ensure we retain the localisation information of multiple sound events, we propose a two-step attention pooling mechanism that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soham97/MTL_Weakly_labelled_audio_data
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.