SpeechBrain: A General-Purpose Speech Toolkit

Mirco Ravanelli; Titouan Parcollet; Peter Plantinga; Aku Rouhe,; Samuele Cornell; Loren Lugosch; Cem Subakan; Nauman Dawalatabad; Abdelwahab; Heba; Jianyuan Zhong; Ju-Chieh Chou; Sung-Lin Yeh; Szu-Wei Fu; Chien-Feng; Liao; Elena Rastorgueva; Fran\c{c}ois Grondin; William Aris; Hwidong Na; Yan; Gao; Renato De Mori; Yoshua Bengio

arXiv:2106.04624·eess.AS·June 10, 2021·513 cites

SpeechBrain: A General-Purpose Speech Toolkit

Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe,, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab, Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng, Liao, Elena Rastorgueva, Fran\c{c}ois Grondin, William Aris

PDF

Open Access 4 Repos 10 Models

TL;DR

SpeechBrain is a comprehensive, open-source toolkit that simplifies the development and benchmarking of neural speech processing systems, offering state-of-the-art performance and extensive resources for researchers and developers.

Contribution

It introduces a flexible, user-friendly architecture with prebuilt recipes, pretrained models, and tutorials, streamlining speech technology research and application development.

Findings

01

Achieves competitive or state-of-the-art results on multiple speech benchmarks.

02

Provides extensive resources including pretrained models and tutorials.

03

Supports a wide range of speech processing tasks.

Abstract

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing