Open-Source System for Multilingual Translation and Cloned Speech Synthesis

Mateo C\'amara; Juan Guti\'errez; Mar\'ia Pilar Daza; Jos\'e Luis Blanco

arXiv:2507.02530·eess.AS·July 4, 2025

Open-Source System for Multilingual Translation and Cloned Speech Synthesis

Mateo C\'amara, Juan Guti\'errez, Mar\'ia Pilar Daza, Jos\'e Luis Blanco

PDF

TL;DR

This paper introduces an open-source system that combines speech recognition, translation, and voice cloning to facilitate multilingual communication and speech regeneration, supporting diverse real-world applications.

Contribution

It presents a novel open-source pipeline integrating Whisper, LLMs, and TTS with voice cloning for multilingual translation and speech regeneration, emphasizing accessibility and local deployment.

Findings

01

System achieves real-time multilingual translation with high accuracy.

02

Voice cloning maintains speaker identity and naturalness.

03

Open-source components enable flexible, cost-effective deployment.

Abstract

We present an open-source system designed for multilingual translation and speech regeneration, addressing challenges in communication and accessibility across diverse linguistic contexts. The system integrates Whisper for speech recognition with Voice Activity Detection (VAD) to identify speaking intervals, followed by a pipeline of Large Language Models (LLMs). For multilingual applications, the first LLM segments speech into coherent, complete sentences, which a second LLM then translates. For speech regeneration, the system uses a text-to-speech (TTS) module with voice cloning capabilities to replicate the original speaker's voice, maintaining naturalness and speaker identity. The system's open-source components can operate locally or via APIs, offering cost-effective deployment across various use cases. These include real-time multilingual translation in Zoom sessions, speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.