Instance-level loss based multiple-instance learning framework for acoustic scene classification
Won-Gook Choi, Joon-Hyuk Chang, Jae-Mo Yang, Han-Gil Moon

TL;DR
This paper introduces an improved instance-level loss-based multiple-instance learning framework for acoustic scene classification, significantly enhancing accuracy and effectively addressing the underestimation problem in MIL.
Contribution
The study develops a novel MIL framework with instance-level labels and loss, and designs a lightweight convolutional module, leading to improved classification performance in ASC.
Findings
Achieved up to 11% accuracy improvement over vanilla MIL.
Attained 81.1% and 72.3% accuracy on TAU 2019 and 2020 datasets.
Outperformed other models with under 1 million parameters.
Abstract
In the acoustic scene classification (ASC) task, an acoustic scene consists of diverse sounds and is inferred by identifying combinations of distinct attributes among them. This study aims to extract and cluster these attributes effectively using an improved multiple-instance learning (MIL) framework for ASC. MIL, known as a weakly supervised learning method, is a strategy for extracting an instance from a bundle of frames composing an input audio clip and inferring a scene corresponding to the input data using these unlabeled instances. However, many studies pointed out an underestimation problem of MIL. In this study, we develop a MIL framework more suitable for ASC systems by defining instance-level labels and loss to extract and cluster instances effectively. Furthermore, we design a fully separated convolutional module, which is a lightweight neural network comprising pointwise,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
