Power pooling: An adaptive pooling function for weakly labelled sound event detection
Yuzhuo Liu, Hangting Chen, YunWang, Pengyuan Zhang

TL;DR
This paper introduces an adaptive power pooling function for weakly labelled sound event detection, significantly improving detection accuracy over existing methods by automatically adapting to various sound sources.
Contribution
It proposes a novel adaptive power pooling function for multiple instance learning, enhancing sound event detection performance with weak labels.
Findings
Outperforms state-of-the-art linear softmax pooling on two datasets.
Improves event-based F1 score by over 10% relative.
Applicable to other MIL tasks beyond sound event detection.
Abstract
Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak labels that only specify the types. This task can be treated as a multiple instance learning (MIL) problem, and the key to it is the design of a pooling function. In this paper, we propose an adaptive power pooling function which can automatically adapt to various sound sources. On two public datasets, the proposed power pooling function outperforms the state-of-the-art linear softmax pooling on both coarsegrained and fine-grained metrics. Notably, it improves the event-based F1 score (which evaluates the detection of event onsets and offsets) by 11.4% and 10.2% relative on the two datasets. While this paper focuses on sound event detection applications,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsSoftmax
