fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro, Okhonko, Juan Pino

TL;DR
fairseq S2T is a scalable, extensible extension of fairseq for speech-to-text tasks, supporting various model architectures and integrating translation and language models for comprehensive end-to-end workflows.
Contribution
It introduces a flexible fairseq extension for speech-to-text modeling with multiple architectures and seamless integration of translation and language models.
Findings
Supports RNN, Transformer, and Conformer models
Provides detailed training recipes and workflows
Enables multi-task and transfer learning
Abstract
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/s2t-large-librispeech-asrmodel· 55 dl· ♡ 1055 dl♡ 10
- 🤗facebook/s2t-medium-librispeech-asrmodel· 671 dl· ♡ 10671 dl♡ 10
- 🤗facebook/s2t-medium-mustc-multilingual-stmodel· 4.0k dl· ♡ 74.0k dl♡ 7
- 🤗facebook/s2t-small-covost2-ca-en-stmodel· 4 dl4 dl
- 🤗facebook/s2t-small-covost2-de-en-stmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗facebook/s2t-small-covost2-en-ca-stmodel· 6 dl6 dl
- 🤗facebook/s2t-small-covost2-en-de-stmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗facebook/s2t-small-covost2-en-et-stmodel· 4 dl4 dl
- 🤗facebook/s2t-small-covost2-en-fa-stmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗facebook/s2t-small-covost2-es-en-stmodel· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
