High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse, Laurent Mazar\'e, Edouard Grave, Patrick P\'erez,, Alexandre D\'efossez, Neil Zeghidour

TL;DR
Hibiki is a novel decoder-only model for simultaneous speech translation that processes source and target speech in real-time, producing high-quality, natural translations with adaptable timing and on-device feasibility.
Contribution
The paper introduces Hibiki, a new model that jointly handles speech-to-speech translation in real-time using a multistream approach and a weakly-supervised delay optimization method.
Findings
Achieves state-of-the-art translation quality on French-English tasks.
Demonstrates high speaker fidelity and naturalness in translations.
Supports real-time, on-device deployment with simple inference.
Abstract
We introduce Hibiki, a decoder-only model for simultaneous speech translation. Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation. We furthermore address the fundamental challenge of simultaneous interpretation, which unlike its consecutive counterpart, where one waits for the end of the source utterance to start translating, adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk. To do so, we introduce a weakly-supervised method that leverages the perplexity of an off-the-shelf text translation system to identify optimal delays on a per-word basis and create aligned synthetic data. After supervised training, Hibiki performs adaptive, simultaneous speech translation with vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kyutai/hibiki-zero-3b-pytorch-bf16model· 702 dl· ♡ 45702 dl♡ 45
- 🤗kyutai/hibiki-1b-mlx-bf16model· 34 dl· ♡ 3034 dl♡ 30
- 🤗kyutai/hibiki-2b-mlx-bf16model· 12 dl· ♡ 2212 dl♡ 22
- 🤗kyutai/hibiki-2b-pytorch-bf16model· 90 dl· ♡ 6190 dl♡ 61
- 🤗kyutai/hibiki-1b-pytorch-bf16model· 169 dl· ♡ 19169 dl♡ 19
- 🤗kyutai/hibiki-1b-rs-bf16model· ♡ 10♡ 10
- 🤗kyutai/hibiki-2b-rs-bf16model· ♡ 4♡ 4
- 🤗kyutai/tts-1.6b-en_frmodel· 31k dl· ♡ 37331k dl♡ 373
- 🤗kyutai/tts-0.75b-en-publicmodel· 31k dl· ♡ 1531k dl♡ 15
- 🤗yapwithai/kyutai-tts-1.6b-en_frmodel· 1 dl1 dl
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis
