EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, and Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

TL;DR
EURO is an open-source toolkit for unsupervised speech recognition that integrates multiple self-supervised models and decoding strategies, achieving state-of-the-art results and promoting reproducibility in UASR research.
Contribution
It introduces EURO, a comprehensive, flexible, and efficient open-source toolkit for UASR based on ESPnet, extending functionality with multiple models and decoding methods.
Findings
Achieves state-of-the-art UASR performance on TIMIT and LibriSpeech.
Demonstrates effectiveness of integrating various self-supervised models.
Provides a unified pipeline for easier application and reproducibility.
Abstract
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsPointwise Convolution · Convolution · Dilated Convolution · Hierarchical Feature Fusion · Kaiming Initialization · 1x1 Convolution · Efficient Spatial Pyramid · Parameterized ReLU · ESPNet
