The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki, Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma,, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam, Subramanian, Wangyou Zhang

TL;DR
The 2020 ESPnet update introduces new features, broadens application scope to include TTS, VC, ST, SE, and achieves state-of-the-art performance with improved models and recipes for end-to-end speech processing.
Contribution
This paper details the latest developments of ESPnet, expanding its applications and enhancing performance with new models, data augmentation, and comprehensive recipes.
Findings
Supports multiple speech processing tasks in a unified framework
Achieves state-of-the-art results on various benchmarks
Provides reproducible recipes for community use
Abstract
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling. The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text to speech (TTS), voice conversation (VC), speech translation (ST), and speech enhancement (SE) with support for beamforming, speech separation, denoising, and dereverberation. All applications are trained in an end-to-end manner, thanks to the generic sequence to sequence modeling properties, and they can be further integrated and jointly optimized. Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
