Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

Chenyang Le; Yinfeng Xia; Huiyan Li; Manhong Wang; Yutao Sun; Xingyang Ma; Yanmin Qian

arXiv:2508.11189·cs.CL·August 18, 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian

PDF

TL;DR

This paper introduces a dual-scale modeling approach for multilingual speech translation that combines model compression, knowledge distillation, and a novel KVSPN module, achieving state-of-the-art performance with significantly improved inference efficiency.

Contribution

It presents the Parasitic Dual-Scale Approach, integrating KVSPN and distillation techniques to enhance multilingual speech translation models' efficiency and accuracy.

Findings

01

Achieved state-of-the-art results across six languages.

02

Realized a 40% speedup without BLEU score loss.

03

Combined methods yield a 2.6× speedup over Whisper Medium.

Abstract

Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40\% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6 $\times$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.