One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe; Yasmin Moslem

arXiv:2604.26136·eess.AS·April 30, 2026

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe, Yasmin Moslem

PDF

TL;DR

This paper evaluates and improves cross-lingual voice cloning for scientific speech in Arabic, Chinese, and French, using the OmniVoice model and data augmentation to enhance intelligibility and speaker similarity.

Contribution

It introduces a system for cross-lingual voice cloning in scientific speech, leveraging data augmentation and ensemble distillation to improve performance across multiple languages.

Findings

01

Synthetic data via ensemble distillation improves intelligibility.

02

Fine-tuning with augmented data enhances speaker similarity.

03

The approach performs well across Arabic, Chinese, and French.

Abstract

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system submission to the International Conference on Spoken Language Translation (IWSLT 2026), the Cross-Lingual Voice Cloning shared task. First, we evaluate several state-of-the-art voice cloning models for cross-lingual speech generation of scientific texts in Arabic, Chinese, and French. Then, we build voice cloning systems based on the OmniVoice foundation model. We employ data augmentation via multi-model ensemble distillation from the ACL 60/60 corpus. We investigate the effect of using this synthetic data for fine-tuning, demonstrating consistent improvements in intelligibility (WER and CER) across languages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.