KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

Zhaolin Li; Yining Liu; Danni Liu; Tuan Nam Nguyen; Enes Yavuz Ugan; Tu Anh Dinh; Carlos Mullov; Alexander Waibel; Jan Niehues

arXiv:2505.19679·cs.CL·January 29, 2026

KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

Zhaolin Li, Yining Liu, Danni Liu, Tuan Nam Nguyen, Enes Yavuz Ugan, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

PDF

Open Access

TL;DR

This paper introduces system enhancements for low-resource speech translation using synthetic data, model regularization, and system combination techniques, leading to improved performance across multiple language pairs.

Contribution

We propose novel methods including synthetic data augmentation, intra-distillation, and system combination to improve low-resource speech translation systems.

Findings

01

Synthetic data improves translation quality for low-resource languages.

02

Intra-distillation enhances model performance across tasks.

03

Combining systems yields approximately 1.5 BLEU point improvement.

Abstract

This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently. This study further explores system enhancement with synthetic data and model regularization. Specifically, we investigate MT-augmented ST by generating translations from ASR data using MT models. For North Levantine, which lacks parallel ST training data, a system trained solely on synthetic data slightly surpasses the cascaded system trained on real data. We also explore augmentation using text-to-speech models by generating synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems