Seamless: Multilingual Expressive and Streaming Speech Translation

Seamless Communication; Lo\"ic Barrault; Yu-An Chung; Mariano Coria; Meglioli; David Dale; Ning Dong; Mark Duppenthaler; Paul-Ambroise Duquenne,; Brian Ellis; Hady Elsahar; Justin Haaheim; John Hoffman; Min-Jae Hwang,; Hirofumi Inaguma; Christopher Klaiber; Ilia Kulikov; Pengwei Li; Daniel; Licht; Jean Maillard; Ruslan Mavlyutov; Alice Rakotoarison; Kaushik Ram; Sadagopan; Abinesh Ramakrishnan; Tuan Tran; Guillaume Wenzek; Yilin Yang,; Ethan Ye; Ivan Evtimov; Pierre Fernandez; Cynthia Gao; Prangthip Hansanti,; Elahe Kalbassi; Amanda Kallet; Artyom Kozhevnikov; Gabriel Mejia Gonzalez,; Robin San Roman; Christophe Touret; Corinne Wong; Carleigh Wood; Bokai Yu,; Pierre Andrews; Can Balioglu; Peng-Jen Chen; Marta R. Costa-juss\`a; Maha; Elbayad; Hongyu Gong; Francisco Guzm\'an; Kevin Heffernan; Somya Jain,; Justine Kao; Ann Lee; Xutai Ma; Alex Mourachko; Benjamin Peloquin; Juan Pino,; Sravya Popuri; Christophe Ropers; Safiyyah Saleem; Holger Schwenk; Anna Sun,; Paden Tomasello; Changhan Wang; Jeff Wang; Skyler Wang; Mary Williamson

arXiv:2312.05187·cs.CL·December 11, 2023·40 cites

Seamless: Multilingual Expressive and Streaming Speech Translation

Seamless Communication, Lo\"ic Barrault, Yu-An Chung, Mariano Coria, Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne,, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang,, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces Seamless, a comprehensive system for real-time, multilingual, expressive speech translation that preserves vocal style and prosody, enabling seamless machine-mediated communication across languages with safety and bias mitigation features.

Contribution

It presents the Seamless system combining expressive, streaming, and multilingual translation models with safety and bias mitigation, advancing real-time speech translation technology.

Findings

01

SeamlessM4T v2 improves multilingual translation with low-resource languages.

02

SeamlessExpressive preserves vocal styles and prosody in translation.

03

SeamlessStreaming achieves low-latency, simultaneous translation for multiple languages.

Abstract

Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/seamless_communication
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques