KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025

Sai Koneru; Maike Z\"ufle; Thai-Binh Nguyen; Seymanur Akti; Jan Niehues; Alexander Waibel

arXiv:2505.13036·cs.CL·May 20, 2025

KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025

Sai Koneru, Maike Z\"ufle, Thai-Binh Nguyen, Seymanur Akti, Jan Niehues, Alexander Waibel

PDF

Open Access 3 Models 1 Datasets

TL;DR

This paper presents KIT's submissions for IWSLT 2025, utilizing large language models to improve offline speech translation and instruction following through multi-stage pipelines and end-to-end models with contextual refinement.

Contribution

It introduces novel pipeline and end-to-end models that leverage LLMs for enhanced speech translation and instruction following, incorporating document-level context and refinement stages.

Findings

01

Improved translation quality with LLM-based fusion and refinement.

02

Effective instruction following with integrated speech encoder and LLM.

03

Enhanced performance through contextual document-level processing.

Abstract

The scope of the International Workshop on Spoken Language Translation (IWSLT) has recently broadened beyond traditional Speech Translation (ST) to encompass a wider array of tasks, including Speech Question Answering and Summarization. This shift is partly driven by the growing capabilities of modern systems, particularly with the success of Large Language Models (LLMs). In this paper, we present the Karlsruhe Institute of Technology's submissions for the Offline ST and Instruction Following (IF) tracks, where we leverage LLMs to enhance performance across all tasks. For the Offline ST track, we propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context. This is followed by a two-step translation process, incorporating additional refinement step to improve translation quality. For the IF track, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

maikezu/data-kit-sub-iwslt2025-if-long-constraint
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems