JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Mayumi Ohta; Julia Kreutzer; Stefan Riezler

arXiv:2210.02545·cs.CL·October 7, 2022

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Mayumi Ohta, Julia Kreutzer, Stefan Riezler

PDF

Open Access 1 Repo

TL;DR

JoeyS2T is a minimalist, easy-to-use speech-to-text toolkit built on JoeyNMT, offering competitive performance with a simple, integrated workflow for speech recognition and translation tasks.

Contribution

It extends JoeyNMT with speech-specific components, creating a unified, accessible toolkit for speech-to-text modeling that maintains simplicity and competitive accuracy.

Findings

01

Performs competitively on speech recognition benchmarks.

02

Provides an integrated, easy-to-use pipeline from data to evaluation.

03

Maintains simplicity while including key speech modeling features.

Abstract

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

may-/joeys2t
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling