Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus
John E. Ortega, Rodolfo Zevallos, Fabricio Carraro

TL;DR
This paper develops a bilingual TTS pipeline for Quechua and Spanish using state-of-the-art models, addressing data scarcity in Quechua and producing high-quality speech synthesis for constitutional content.
Contribution
It introduces a unified multilingual TTS framework that leverages cross-lingual transfer to improve synthesis in low-resource Quechua while maintaining Spanish quality.
Findings
Achieved high-quality speech synthesis for Quechua and Spanish.
Demonstrated effective cross-lingual transfer in TTS models.
Provided open resources including checkpoints and synthesized audio.
Abstract
We present a unified pipeline for synthesizing high-quality Quechua and Spanish speech for the Peruvian Constitution using three state-of-the-art text-to-speech (TTS) architectures: XTTS v2, F5-TTS, and DiFlow-TTS. Our models are trained on independent Spanish and Quechua speech datasets with heterogeneous sizes and recording conditions, and leverage bilingual and multilingual TTS capabilities to improve synthesis quality in both languages. By exploiting cross-lingual transfer, our framework mitigates data scarcity in Quechua while preserving naturalness in Spanish. We release trained checkpoints, inference code, and synthesized audio for each constitutional article, providing a reusable resource for speech technologies in indigenous and multilingual contexts. This work contributes to the development of inclusive TTS systems for political and legal content in low-resource settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
