Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement
Mhd Modar Halimeh, Walter Kellermann

TL;DR
This paper introduces a novel online multichannel speech enhancement method using complex-valued neural networks that effectively exploits spatial and spectral information for improved speech quality.
Contribution
It presents the complex-valued spatial autoencoder, a deep neural network that manipulates phase and amplitude for enhanced multichannel speech processing.
Findings
Superior speech quality compared to baseline methods
Exploits spatial and spectral characteristics effectively
Physically plausible spatial selectivity achieved
Abstract
In this contribution, we present a novel online approach to multichannel speech enhancement. The proposed method estimates the enhanced signal through a filter-and-sum framework. More specifically, complex-valued masks are estimated by a deep complex-valued neural network, termed the complex-valued spatial autoencoder. The proposed network is capable of exploiting as well as manipulating both the phase and the amplitude of the microphone signals. As shown by the experimental results, the proposed approach is able to exploit both spatial and spectral characteristics of the desired source signal resulting in a physically plausible spatial selectivity and superior speech quality compared to other baseline methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Acoustic Wave Phenomena Research
