F5-TTS-RO: Extending F5-TTS to Romanian TTS via Lightweight Input Adaptation

Radu-Gabriel Chivereanu; Tiberiu Boros

arXiv:2512.12297·cs.CL·December 16, 2025

F5-TTS-RO: Extending F5-TTS to Romanian TTS via Lightweight Input Adaptation

Radu-Gabriel Chivereanu, Tiberiu Boros

PDF

Open Access

TL;DR

This paper presents a lightweight adaptation method for extending the F5-TTS text-to-speech model to support Romanian, preserving original capabilities while enabling natural Romanian speech synthesis with minimal retraining.

Contribution

Introduces a novel input-level adapter for F5-TTS that supports Romanian by adding a sub-network trained on Romanian text, keeping original weights frozen.

Findings

01

Maintains voice cloning capabilities in Romanian.

02

Enables code-switching between Romanian and English.

03

Achieves natural-sounding Romanian speech with residual English accent.

Abstract

This work introduces a lightweight input-level adapter for the F5-TTS model that enables Romanian Language support. To preserve the existing capabilities of the model (voice cloning, English and Chinese support), we keep the original weights frozen, append a sub-network to the model and train it as an extension for the textual embedding matrix of the text encoder. For simplicity, we rely on ConvNeXt module implemented in F5-TTS to also model the co-dependencies between the new character-level embeddings. The module serves as a ``soft`` letter-to-sound layer, converting Romanian text into a continuous representation that the F5-TTS model uses to produce naturally sounding Romanian utterances. We evaluate the model with a pool of 20 human listeners across three tasks: (a) audio similarity between reference and generated speech, (b) pronunciation and naturalness and (c) Romanian-English…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research