WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target   Speaker Extraction

Shuai Wang; Ke Zhang; Shaoxiong Lin; Junjie Li; Xuefei Wang; Meng Ge,; Jianwei Yu; Yanmin Qian; Haizhou Li

arXiv:2409.15799·eess.AS·September 25, 2024

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge,, Jianwei Yu, Yanmin Qian, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

WeSep is a comprehensive, open-source toolkit for target speaker extraction that offers flexible modeling, scalable data handling, and deployment features, advancing research and practical use in multi-talker speech separation.

Contribution

The paper introduces WeSep, a new scalable and flexible toolkit for target speaker extraction, addressing the lack of open-source resources in this field.

Findings

01

Effective on-the-fly data simulation demonstrated

02

Structured recipes facilitate research and deployment

03

Toolkit supports various target speaker modeling approaches

Abstract

Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem. In recent years, TSE draws increasing attention due to its potential for various applications such as user-customized interfaces and hearing aids, or as a crutial front-end processing technologies for subsequential tasks such as speech recognition and speaker recongtion. However, there are currently few open-source toolkits or available pre-trained models for off-the-shelf usage. In this work, we introduce WeSep, a toolkit designed for research and practical applications in TSE. WeSep is featured with flexible target speaker modeling, scalable data management, effective on-the-fly data simulation, structured recipes and deployment support. The toolkit is publicly avaliable at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenet-e2e/wesep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSoftmax · Attention Is All You Need