Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight Adaptation
Kayo Nada, Keisuke Imoto, Takao Tsuchiya

TL;DR
This paper introduces dynamic weight adaptation methods for multitask learning models that jointly analyze acoustic scenes and sound events, improving performance by automatically balancing task losses during training.
Contribution
It proposes novel dynamic weight adaptation techniques based on dynamic weight average and multi-focal loss for joint ASC and SED analysis in MTL models.
Findings
Improved scene classification accuracy
Enhanced sound event detection performance
Dynamic weights adapt effectively during training
Abstract
Acoustic scene classification (ASC) and sound event detection (SED) are major topics in environmental sound analysis. Considering that acoustic scenes and sound events are closely related to each other, the joint analysis of acoustic scenes and sound events using multitask learning (MTL)-based neural networks was proposed in some previous works. Conventional methods train MTL-based models using a linear combination of ASC and SED loss functions with constant weights. However, the performance of conventional MTL-based methods depends strongly on the weights of the ASC and SED losses, and it is difficult to determine the appropriate balance between the constant weights of the losses of MTL of ASC and SED. In this paper, we thus propose dynamic weight adaptation methods for MTL of ASC and SED based on dynamic weight average and multi--focal loss to adjust the learning weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Underwater Acoustics Research
