MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task
Jorge Iranzo-S\'anchez, Javier Iranzo-S\'anchez, Adri\`a Gim\'enez, Jorge Civera, Alfons Juan

TL;DR
This paper presents a modular, real-time speech translation system for long-form content that combines pre-trained models with adaptive strategies to balance translation quality and latency in the IWSLT 2025 challenge.
Contribution
It introduces a novel cascade system that adapts strong pre-trained models for streaming translation without extensive retraining, addressing long-form speech challenges.
Findings
Achieved BLEU score of 31.96 on ACL60/60 dataset.
Latency of 2.94 seconds with non-computational-aware StreamLAAL.
Preliminary test BLEU score of 29.8 on IWSLT25Instruct.
Abstract
This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track. Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system that adapts strong pre-trained models to streaming scenarios. We combine Whisper Large-V3-Turbo for ASR with the multilingual NLLB-3.3B model for MT, implementing lightweight adaptation techniques rather than training new end-to-end models from scratch. Our approach employs document-level adaptation with prefix training to enhance the MT model's ability to handle incomplete inputs, while incorporating adaptive emission policies including a wait- strategy and RALCP for managing the translation stream. Specialized buffer management techniques and segmentation strategies ensure coherent translations across long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
