ESPnet-se: end-to-end speech enhancement and separation toolkit designed   for asr integration

Chenda Li; Jing Shi; Wangyou Zhang; Aswin Shanmugam Subramanian,; Xuankai Chang; Naoyuki Kamo; Moto Hira; Tomoki Hayashi; Christoph Boeddeker,; Zhuo Chen; Shinji Watanabe

arXiv:2011.03706·eess.AS·November 18, 2021

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian,, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker,, Zhuo Chen, Shinji Watanabe

PDF

5 Models

TL;DR

ESPnet-SE is a comprehensive toolkit for speech enhancement and separation that integrates automatic speech recognition, supporting various functionalities and benchmark datasets for efficient development and validation.

Contribution

It introduces a unified framework combining speech enhancement, separation, and recognition, with multi-channel processing and all-in-one recipes, differentiating it from existing tools.

Findings

01

Effective processing of single and multi-channel data

02

Supports dereverberation, denoising, and source separation

03

Achieves competitive results on benchmark datasets

Abstract

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.