End-to-end multi-channel speaker extraction and binaural speech synthesis
Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Yao Ge, Xiaodong Li, Chengshi Zheng

TL;DR
This paper presents an end-to-end deep learning framework that simultaneously extracts, denoises, and spatializes speech from multi-channel inputs, significantly improving remote conferencing audio quality and spatial immersion.
Contribution
The proposed framework unifies source extraction, noise suppression, and binaural rendering into a single end-to-end model with a novel spatial loss function, advancing spatial audio processing.
Findings
Outperforms baseline methods in speech quality
Enhances spatial fidelity of binaural audio
Effective in reverberant and noisy environments
Abstract
Speech clarity and spatial audio immersion are the two most critical factors in enhancing remote conferencing experiences. Existing methods are often limited: either due to the lack of spatial information when using only one microphone, or because their performance is highly dependent on the accuracy of direction-of-arrival estimation when using microphone array. To overcome this issue, we introduce an end-to-end deep learning framework that has the capacity of mapping multi-channel noisy and reverberant signals to clean and spatialized binaural speech directly. This framework unifies source extraction, noise suppression, and binaural rendering into one network. In this framework, a novel magnitude-weighted interaural level difference loss function is proposed that aims to improve the accuracy of spatial rendering. Extensive evaluations show that our method outperforms established…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
