Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian

TL;DR
This paper introduces a dual-scale modeling approach for multilingual speech translation that combines model compression, knowledge distillation, and a novel KVSPN module, achieving state-of-the-art performance with significantly improved inference efficiency.
Contribution
It presents the Parasitic Dual-Scale Approach, integrating KVSPN and distillation techniques to enhance multilingual speech translation models' efficiency and accuracy.
Findings
Achieved state-of-the-art results across six languages.
Realized a 40% speedup without BLEU score loss.
Combined methods yield a 2.6× speedup over Whisper Medium.
Abstract
Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40\% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
