Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement
Dongheon Lee, Ashutosh Pandey, Sanjeel Parekh, Daniel Wong, Jacob Donley, Buye Xu, Juan Azcarreta

TL;DR
Spatial-Magnifier is a neural network that creates virtual microphone signals from limited real microphone data, enhancing multichannel speech processing on edge devices.
Contribution
It introduces a novel spatial upsampling framework and a learning approach that significantly improves speech enhancement performance with fewer physical microphones.
Findings
Outperforms existing spatial upsampling baselines.
Nearly recovers oracle performance with all microphones.
Improves various speech extraction systems.
Abstract
While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose Spatial-Magnifier, a neural network designed to generate virtual microphone (VM) signals from a limited set of real microphone (RM) measurements. Moreover, we introduce the Spatial Audio Representation Learning (SARL) framework, which leverages estimated VM signals and features to condition a downstream speech enhancement system. Experimental results demonstrate that the proposed framework outperforms existing spatial upsampling baselines across various speech extraction systems, including end-to-end multichannel speech enhancement and neural beamforming. The proposed method nearly recovers the oracle performance achieved when all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
