Seamless: Multilingual Expressive and Streaming Speech Translation
Seamless Communication, Lo\"ic Barrault, Yu-An Chung, Mariano Coria, Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne,, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang,, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov

TL;DR
This paper introduces Seamless, a comprehensive system for real-time, multilingual, expressive speech translation that preserves vocal style and prosody, enabling seamless machine-mediated communication across languages with safety and bias mitigation features.
Contribution
It presents the Seamless system combining expressive, streaming, and multilingual translation models with safety and bias mitigation, advancing real-time speech translation technology.
Findings
SeamlessM4T v2 improves multilingual translation with low-resource languages.
SeamlessExpressive preserves vocal styles and prosody in translation.
SeamlessStreaming achieves low-latency, simultaneous translation for multiple languages.
Abstract
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/seamless-m4t-v2-largemodel· 93k dl· ♡ 96693k dl♡ 966
- 🤗facebook/w2v-bert-2.0model· 2.3M dl· ♡ 2072.3M dl♡ 207
- 🤗espnet/xeusmodel· 33 dl· ♡ 14633 dl♡ 146
- 🤗facebook/seamless-m4t-mediummodel· ♡ 135♡ 135
- 🤗facebook/seamless-m4t-largemodel· ♡ 513♡ 513
- 🤗facebook/seamless-expressivemodel· ♡ 187♡ 187
- 🤗facebook/seamless-streamingmodel· ♡ 281♡ 281
- 🤗Aspik101/w2v-bert-2.0-polish-CV16.0model· 8 dl· ♡ 28 dl♡ 2
- 🤗ArthurMalajyan/seamless-m4t-v2-large-asr-hywmodel· ♡ 2♡ 2
- 🤗WueNLP/seamless-m4t-v2-large-speech-encodermodel· 2.3k dl· ♡ 92.3k dl♡ 9
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
