CasNet: Investigating Channel Robustness for Speech Separation

Fan-Lin Wang; Yao-Fei Cheng; Hung-Shin Lee; Yu Tsao; Hsin-Min Wang

arXiv:2210.15370·cs.SD·October 28, 2022·1 cites

CasNet: Investigating Channel Robustness for Speech Separation

Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

PDF

Open Access 1 Repo

TL;DR

CasNet is a novel deep learning framework that enhances speech separation robustness by incorporating channel embeddings, effectively addressing channel mismatch issues in real-world scenarios.

Contribution

Introduces CasNet, a channel-aware speech separation network that leverages channel embeddings and FiLM to improve performance under channel mismatch conditions.

Findings

01

CasNet outperforms baseline TasNet in experiments.

02

Channel embeddings improve robustness to channel mismatch.

03

Training strategies influence the role of channel information.

Abstract

Recording channel mismatch between training and testing conditions has been shown to be a serious problem for speech separation. This situation greatly reduces the separation performance, and cannot meet the requirement of daily use. In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation. CasNet is implemented on top of TasNet. Channel embedding (characterizing channel information in a mixture of multiple utterances) generated by Channel Encoder is introduced into the separation module by the FiLM technique. Through two training strategies, we explore two roles that channel embedding may play: 1) a real-life noise disturbance, making the model more robust, or 2) a guide, instructing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinica-slam/casnet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing