BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System
Matthew Raffel, Victor Agostinelli, Lizhong Chen

TL;DR
BeaverTalk is a cascaded speech translation system for IWSLT 2025 that combines VAD, Whisper ASR, and Gemma LLMs, fine-tuned with LoRAs for high-quality simultaneous translation in English-German and English-Chinese tasks.
Contribution
This work introduces BeaverTalk, a novel cascaded speech translation system with fine-tuned LLMs and a conversational prompting strategy for improved simultaneous translation performance.
Findings
Achieved BLEU scores of 24.64 and 27.83 for English-German at different latency levels.
Achieved BLEU scores of 34.07 and 37.23 for English-Chinese at different latency levels.
Demonstrated effective integration of VAD, Whisper, and Gemma models for real-time translation.
Abstract
This paper discusses the construction, fine-tuning, and deployment of BeaverTalk, a cascaded system for speech-to-text translation as part of the IWSLT 2025 simultaneous translation task. The system architecture employs a VAD segmenter for breaking a speech stream into segments, Whisper Large V2 for automatic speech recognition (ASR), and Gemma 3 12B for simultaneous translation. Regarding the simultaneous translation LLM, it is fine-tuned via low-rank adaptors (LoRAs) for a conversational prompting strategy that leverages a single prior-sentence memory bank from the source language as context. The cascaded system participated in the EnglishGerman and EnglishChinese language directions for both the low and high latency regimes. In particular, on the EnglishGerman task, the system achieves a BLEU of 24.64 and 27.83 at a StreamLAAL of 1837.86 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
