Open-Source Conversational AI with SpeechBrain 1.0

Mirco Ravanelli; Titouan Parcollet; Adel Moumen; Sylvain de Langen,; Cem Subakan; Peter Plantinga; Yingzhi Wang; Pooneh Mousavi; Luca Della; Libera; Artem Ploujnikov; Francesco Paissan; Davide Borra; Salah Zaiem; Zeyu; Zhao; Shucong Zhang; Georgios Karakasidis; Sung-Lin Yeh; Pierre Champion; Aku; Rouhe; Rudolf Braun; Florian Mai; Juan Zuluaga-Gomez; Seyed Mahed Mousavi,; Andreas Nautsch; Ha Nguyen; Xuechen Liu; Sangeet Sagar; Jarod Duret; Salima; Mdhaffar; Gaelle Laperriere; Mickael Rouvier; Renato De Mori; Yannick Esteve

arXiv:2407.00463·cs.LG·November 11, 2024·2 cites

Open-Source Conversational AI with SpeechBrain 1.0

Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen,, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della, Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu, Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh

PDF

Open Access 1 Models

TL;DR

SpeechBrain 1.0 is an open-source toolkit that advances speech and language processing with new models, technologies, and a unified benchmark platform, promoting transparency and reproducibility in conversational AI research.

Contribution

The paper introduces SpeechBrain 1.0, featuring over 200 recipes, new technologies for multimodal learning, LLM integration, and a comprehensive benchmark repository.

Findings

01

Over 200 speech and language processing recipes available

02

Integration of Large Language Models and advanced decoding strategies

03

New benchmark repository for model evaluation

Abstract

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
speechbrain/asr-streaming-conformer-gigaspeech
model· 16 dl· ♡ 5
16 dl♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques