Estimation of binary time-frequency masks from ambient noise
Jos\'e Luis Romero, Michael Speckbacher

TL;DR
This paper demonstrates that a binary time-frequency mask can be reliably estimated from ambient noise observations using spectrogram averaging, without detailed knowledge of noise variance or filtering profile.
Contribution
It introduces a method to identify general binary masks from ambient noise by analyzing average spectrograms, requiring minimal prior information.
Findings
The lower quantile of averaged spectrograms identifies the mask with high confidence.
Estimation error is primarily influenced by the perimeter of the mask.
The method does not require precise noise variance or filter profile knowledge.
Abstract
We investigate the retrieval of a binary time-frequency mask from a few observations of filtered white ambient noise. Confirming household wisdom in acoustic modeling, we show that this is possible by inspecting the average spectrogram of ambient noise. Specifically, we show that the lower quantile of the average of masked spectrograms is enough to identify a rather general mask with confidence at least , up to shape details concentrated near the boundary of . As an application, the expected measure of the estimation error is dominated by the perimeter of the time-frequency mask. The estimator requires no knowledge of the noise variance, and only a very qualitative profile of the filtering window, but no exact knowledge of it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Hearing Loss and Rehabilitation
