HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track

Xuchen Wei; Yangxin Wu; Yaoyin Zhang; Henglyu Liu; Kehai Chen; Xuefeng Bai; Min Zhang

arXiv:2507.19616·cs.CL·July 29, 2025

HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track

Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang

PDF

1 Video

TL;DR

This paper introduces an end-to-end speech translation system combining Whisper ASR and Krutrim LLM for English-Indic translation, achieving competitive BLEU scores and exploring Chain-of-Thought prompting to improve translation quality.

Contribution

The paper presents a novel integration of pre-trained ASR and Indic-specific LLMs for low-resource speech translation, and investigates Chain-of-Thought prompting effects.

Findings

01

Achieved BLEU scores of 28.88 (English-Indic) and 27.86 (Indic-English)

02

Chain-of-Thought improved Tamil-to-English translation BLEU by 13.84

03

Challenges in maintaining consistent CoT output format

Abstract

This paper presents HITSZ's submission for the IWSLT 2025 Indic track, focusing on speech-to-text translation (ST) for English-to-Indic and Indic-to-English language pairs. To enhance translation quality in this low-resource scenario, we propose an end-to-end system integrating the pre-trained Whisper automated speech recognition (ASR) model with Krutrim, an Indic-specialized large language model (LLM). Experimental results demonstrate that our end-to-end system achieved average BLEU scores of $28.88$ for English-to-Indic directions and $27.86$ for Indic-to-English directions. Furthermore, we investigated the Chain-of-Thought (CoT) method. While this method showed potential for significant translation quality improvements on successfully parsed outputs (e.g. a $13.84$ BLEU increase for Tamil-to-English), we observed challenges in ensuring the model consistently adheres to the required…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track· underline