Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes
Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, Juan Pablo Bello

TL;DR
This paper introduces a weakly supervised approach for estimating the loudness of specific sound sources in noisy soundscapes, addressing data scarcity and robustness issues in real-world conditions.
Contribution
It proposes a novel method that leverages clip-level annotations and modified loss functions to improve source-specific sound level estimation.
Findings
Improved SSSLE performance over baseline models
Effective handling of background noise in recordings
Validation of method's robustness in practical scenarios
Abstract
While the estimation of what sound sources are, when they occur, and from where they originate has been well-studied, the estimation of how loud these sound sources are has been often overlooked. Current solutions to this task, which we refer to as source-specific sound level estimation (SSSLE), suffer from challenges due to the impracticality of acquiring realistic data and a lack of robustness to realistic recording conditions. Recently proposed weakly supervised source separation offer a means of leveraging clip-level source annotations to train source separation models, which we augment with modified loss functions to bridge the gap between source separation and SSSLE and to address the presence of background. We show that our approach improves SSSLE performance compared to baseline source separation models and provide an ablation analysis to explore our method's design choices,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
