Disentangling the Impacts of Language and Channel Variability on Speech   Separation Networks

Fan-Lin Wang; Hung-Shin Lee; Yu Tsao; Hsin-Min Wang

arXiv:2203.16040·cs.SD·June 22, 2022

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates how language and channel variability affect speech separation performance, finding channel differences have a larger impact than language, and proposes a projection-based method to mitigate channel mismatch effects.

Contribution

The study disentangles the effects of language and channel on speech separation, and introduces a projection-based approach to address channel mismatch issues.

Findings

01

Channel differences impact speech separation more than language differences.

02

Training on Android phone data enhances generalizability.

03

Projection-based channel similarity measurement improves performance on in-the-wild data.

Abstract

Because the performance of speech separation is excellent for speech in which two speakers completely overlap, research attention has been shifted to dealing with more realistic scenarios. However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation. Speaker and environment mismatches have been studied in the existing literature. Nevertheless, there are few studies on speech content and channel mismatches. Moreover, the impacts of language and channel in these studies are mostly tangled. In this study, we create several datasets for various experiments. The results show that the impacts of different languages are small enough to be ignored compared to the impacts of different channels. In our experiments, training on data recorded by Android phones leads to the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinica-slam/cospro-mix
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing