Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments
Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang

TL;DR
This paper enhances deep speech separation in unpredictable ad-hoc microphone setups by integrating TAC layers with dual-path transformers and employing clustering to improve information fusion across microphones.
Contribution
It introduces a novel combination of TAC layers, dual-path transformers, and clustering to better handle variable microphone configurations in speech separation tasks.
Findings
Clustering microphones improves speech separation performance.
Deep cluster-informed approach enhances robustness in ad-hoc environments.
Integration of TAC and transformers yields better source separation.
Abstract
Ad-hoc distributed microphone environments, where microphone locations and numbers are unpredictable, present a challenge to traditional deep learning models, which typically require fixed architectures. To tailor deep learning models to accommodate arbitrary array configurations, the Transform-Average-Concatenate (TAC) layer was previously introduced. In this work, we integrate TAC layers with dual-path transformers for speech separation from two simultaneous talkers in realistic settings. However, the distributed nature makes it hard to fuse information across microphones efficiently. Therefore, we explore the efficacy of blindly clustering microphones around sources of interest prior to enhancement. Experimental results show that this deep cluster-informed approach significantly improves the system's capacity to cope with the inherent variability observed in ad-hoc distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
