Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
Jonah Casebeer, Jamshed Kaikaus, Paris Smaragdis

TL;DR
This paper introduces an attention-based microphone selection method for neural speech enhancement that balances communication costs and performance, adapting the number of microphones used based on scene SNR in complex acoustic environments.
Contribution
It proposes a joint learning approach for microphone selection and speech enhancement that dynamically adjusts microphone usage to optimize communication cost and task performance.
Findings
Matches fixed-microphone performance while reducing communication costs
Adapts microphone usage based on scene SNR
Effective in complex echoic environments with moving sources
Abstract
In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trained to reduce communication-costs through a penalty term which represents a task-performance/ communication-cost trade-off. While working within the trade-off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in complex echoic acoustic scenes with moving sources and show that it matches the performance of models that stream from a fixed number of microphones while reducing communication costs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
