Joint framework with deep feature distillation and adaptive focal loss   for weakly supervised audio tagging and acoustic event detection

Yunhao Liang; Yanhua Long; Yijie Li; Jiaen Liang; Yuping Wang

arXiv:2103.12388·eess.AS·February 15, 2022

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang

PDF

TL;DR

This paper introduces a joint training framework for weakly supervised audio tagging and acoustic event detection, utilizing deep feature distillation, adaptive focal loss, and event-specific post-processing to enhance performance.

Contribution

It proposes a novel combination of deep feature distillation, adaptive focal loss, and post-processing strategies within a teacher-student framework for improved weakly supervised audio analysis.

Findings

01

Achieved 81.2% F1-score in audio tagging

02

Achieved 49.8% F1-score in acoustic event detection

03

Demonstrated competitive performance on DCASE 2019 dataset

Abstract

A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously. In this study, we propose three methods to improve the best teacher-student framework in the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 for both audio tagging and acoustic events detection tasks. A frame-level target-events based deep feature distillation is first proposed, which aims to leverage the potential of limited strong-labeled data in weakly supervised framework to learn better intermediate feature maps. Then, we propose an adaptive focal loss and two-stage training strategy to enable an effective and more accurate model training, where the contribution of hard and easy acoustic events to the total cost function can be automatically adjusted. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocal Loss