SpeechBrain: A General-Purpose Speech Toolkit
Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe,, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab, Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng, Liao, Elena Rastorgueva, Fran\c{c}ois Grondin, William Aris

TL;DR
SpeechBrain is a comprehensive, open-source toolkit that simplifies the development and benchmarking of neural speech processing systems, offering state-of-the-art performance and extensive resources for researchers and developers.
Contribution
It introduces a flexible, user-friendly architecture with prebuilt recipes, pretrained models, and tutorials, streamlining speech technology research and application development.
Findings
Achieves competitive or state-of-the-art results on multiple speech benchmarks.
Provides extensive resources including pretrained models and tutorials.
Supports a wide range of speech processing tasks.
Abstract
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗speechbrain/spkrec-ecapa-voxcelebmodel· 1.7M dl· ♡ 3201.7M dl♡ 320
- 🤗speechbrain/emotion-recognition-wav2vec2-IEMOCAPmodel· 497k dl· ♡ 184497k dl♡ 184
- 🤗LanceaKing/spkrec-ecapa-cncelebmodel· 5 dl· ♡ 45 dl♡ 4
- 🤗aheba31/test-predictormodel· 4 dl4 dl
- 🤗ddwkim/asr-conformer-transformerlm-ksponspeechmodel· 7 dl7 dl
- 🤗moumeneb1/testingmodel· 4 dl4 dl
- 🤗speechbrain/REAL-M-sisnr-estimator-trainingmodel· 22 dl22 dl
- 🤗speechbrain/REAL-M-sisnr-estimatormodel· 22 dl· ♡ 222 dl♡ 2
- 🤗speechbrain/SLU-direct-SLURP-hubert-encmodel· 20 dl· ♡ 420 dl♡ 4
- 🤗speechbrain/asr-conformer-transformerlm-ksponspeechmodel· 42 dl· ♡ 1142 dl♡ 11
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
