Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection
Yuzhuo Liu, Hangting Chen, Pengyuan Zhang

TL;DR
This paper introduces a confidence learning method and a power pooling function for semi-supervised sound event detection, significantly improving accuracy and reducing error rates by leveraging confidence-weighted data and nonlinear pooling.
Contribution
It proposes a novel confidence-based weighting scheme and a trainable power pooling function, enhancing semi-supervised sound event detection performance.
Findings
Confidence correlates with prediction accuracy.
Power pooling outperforms linear pooling.
34% relative error rate reduction achieved.
Abstract
In recent years, the involvement of synthetic strongly labeled data,weakly labeled data and unlabeled data has drawn much research attentionin semi-supervised sound event detection (SSED). Self-training models carry out predictions without strong annotations and then take predictions with high probabilities as pseudo-labels for retraining. Such models have shown its effectiveness in SSED. However, probabilities are poorly calibrated confidence estimates, and samples with low probabilities are ignored. Hence, we introduce a method of learning confidence deliberately and retaining all data distinctly by applying confidence as weights. Additionally, linear pooling has been considered as a state-of-the-art aggregation function for SSED with weak labeling. In this paper, we propose a power pooling function whose coefficient can be trained automatically to achieve nonlinearity. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior
