ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration
Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian,, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker,, Zhuo Chen, Shinji Watanabe

TL;DR
ESPnet-SE is a comprehensive toolkit for speech enhancement and separation that integrates automatic speech recognition, supporting various functionalities and benchmark datasets for efficient development and validation.
Contribution
It introduces a unified framework combining speech enhancement, separation, and recognition, with multi-channel processing and all-in-one recipes, differentiating it from existing tools.
Findings
Effective processing of single and multi-channel data
Supports dereverberation, denoising, and source separation
Achieves competitive results on benchmark datasets
Abstract
We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗espnet/Wangyou_Zhang_chime4_enh_train_enh_beamformer_mvdr_rawmodel· 14 dl· ♡ 114 dl♡ 1
- 🤗espnet/Wangyou_Zhang_chime4_enh_train_enh_dc_crn_mapping_snr_rawmodel· 3 dl3 dl
- 🤗espnet/Wangyou_Zhang_chime4_enh_train_enh_conv_tasnet_rawmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗espnet/Wangyou_Zhang_wsj0_2mix_enh_dc_crn_mapping_snr_rawmodel· 2 dl2 dl
- 🤗espnet/Wangyou_Zhang_wsj0_2mix_enh_train_enh_dptnet_rawmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
