Guided learning for weakly-labeled semi-supervised sound event detection
Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian

TL;DR
This paper introduces Guided Learning, a semi-supervised approach for sound event detection that uses a teacher-student model setup to improve boundary detection by leveraging weak labels and audio tagging performance.
Contribution
The paper presents a novel teacher-student framework that separates audio tagging and boundary detection, improving semi-supervised sound event detection without complex trade-offs.
Findings
Achieves competitive performance on DCASE2018 Task4 dataset
Effectively leverages unlabeled data for boundary detection
Demonstrates the benefit of separating sub-tasks in semi-supervised learning
Abstract
We propose a simple but efficient method termed Guided Learning for weakly-labeled semi-supervised sound event detection (SED). There are two sub-targets implied in weakly-labeled SED: audio tagging and boundary detection. Instead of designing a single model by considering a trade-off between the two sub-targets, we design a teacher model aiming at audio tagging to guide a student model aiming at boundary detection to learn using the unlabeled data. The guidance is guaranteed by the audio tagging performance gap of the two models. In the meantime, the student model liberated from the trade-off is able to provide more excellent boundary detection results. We propose a principle to design such two models based on the relation between the temporal compression scale and the two sub-targets. We also propose an end-to-end semi-supervised learning process for these two models to enable their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
