IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation

Bhavana Akkiraju; Aishwarya Pothula; Santosh Kesiraju; Anil Kumar Vuppala

arXiv:2506.04714·cs.CL·June 6, 2025

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation

Bhavana Akkiraju, Aishwarya Pothula, Santosh Kesiraju, Anil Kumar Vuppala

PDF

Open Access

TL;DR

This paper details the IIITH-BUT system's approach to improving low-resource Bhojpuri-Hindi speech translation by hyperparameter tuning, data augmentation, and cross-lingual training, resulting in significant quality enhancements.

Contribution

The paper introduces systematic hyperparameter optimization, data augmentation, and cross-lingual training techniques tailored for low-resource speech translation tasks.

Findings

01

Hyperparameter tuning improves translation quality.

02

Data augmentation techniques like speed perturbation and SpecAugment are effective.

03

Cross-lingual training with Marathi enhances performance.

Abstract

This paper presents the submission of IIITH-BUT to the IWSLT 2025 shared task on speech translation for the low-resource Bhojpuri-Hindi language pair. We explored the impact of hyperparameter optimisation and data augmentation techniques on the performance of the SeamlessM4T model fine-tuned for this specific task. We systematically investigated a range of hyperparameters including learning rate schedules, number of update steps, warm-up steps, label smoothing, and batch sizes; and report their effect on translation quality. To address data scarcity, we applied speed perturbation and SpecAugment and studied their effect on translation quality. We also examined the use of cross-lingual signal through joint training with Marathi and Bhojpuri speech data. Our experiments reveal that careful selection of hyperparameters and the application of simple yet effective augmentation techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings