Cross-Talk Reduction
Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

TL;DR
This paper introduces a novel task called cross-talk reduction (CTR) and proposes CTRnet, an unsupervised and weakly-supervised neural network approach to reduce cross-talk speech in multi-microphone recordings, improving speech separation.
Contribution
The paper presents CTRnet, a new neural network method for cross-talk reduction that operates in unsupervised and weakly-supervised settings, advancing speech separation techniques.
Findings
Effective in simulated two-speaker CTR tasks
Improves speech separation in real-recorded conversational data
Demonstrates potential for practical multi-microphone speech processing
Abstract
While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context, we propose a novel task named cross-talk reduction (CTR) which aims at reducing cross-talk speech, and a novel solution named CTRnet which is based on unsupervised or weakly-supervised neural speech separation. In unsupervised CTRnet, close-talk and far-field mixtures are stacked as input for a DNN to estimate the close-talk speech of each speaker. It is trained in an unsupervised, discriminative way such that the DNN estimate for each speaker can be linearly filtered to cancel out the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
