MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task

Jorge Iranzo-S\'anchez; Javier Iranzo-S\'anchez; Adri\`a Gim\'enez; Jorge Civera; Alfons Juan

arXiv:2506.18828·cs.CL·June 24, 2025

MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task

Jorge Iranzo-S\'anchez, Javier Iranzo-S\'anchez, Adri\`a Gim\'enez, Jorge Civera, Alfons Juan

PDF

TL;DR

This paper presents a modular, real-time speech translation system for long-form content that combines pre-trained models with adaptive strategies to balance translation quality and latency in the IWSLT 2025 challenge.

Contribution

It introduces a novel cascade system that adapts strong pre-trained models for streaming translation without extensive retraining, addressing long-form speech challenges.

Findings

01

Achieved BLEU score of 31.96 on ACL60/60 dataset.

02

Latency of 2.94 seconds with non-computational-aware StreamLAAL.

03

Preliminary test BLEU score of 29.8 on IWSLT25Instruct.

Abstract

This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track. Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system that adapts strong pre-trained models to streaming scenarios. We combine Whisper Large-V3-Turbo for ASR with the multilingual NLLB-3.3B model for MT, implementing lightweight adaptation techniques rather than training new end-to-end models from scratch. Our approach employs document-level adaptation with prefix training to enhance the MT model's ability to handle incomplete inputs, while incorporating adaptive emission policies including a wait- $k$ strategy and RALCP for managing the translation stream. Specialized buffer management techniques and segmentation strategies ensure coherent translations across long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.