Efficient Compression of Multitask Multilingual Speech Models

Thomas Palmeira Ferraz

arXiv:2405.00966·cs.CL·May 3, 2024

Efficient Compression of Multitask Multilingual Speech Models

Thomas Palmeira Ferraz

PDF

Open Access

TL;DR

This paper introduces DistilWhisper, a novel compression method for multilingual speech models that improves recognition accuracy for under-represented languages while maintaining model robustness and efficiency.

Contribution

It proposes a dual strategy of lightweight fine-tuning with language-specific experts and knowledge distillation to enhance multilingual speech model performance.

Findings

01

DistilWhisper outperforms standard fine-tuning and LoRA adapters.

02

It improves ASR accuracy for low-resource languages.

03

The approach introduces negligible parameter overhead.

Abstract

Whisper is a multitask and multilingual speech model covering 99 languages. It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we examine its limitations, demonstrating the presence of speaker-related (gender, age) and model-related (resourcefulness and model size) bias. Despite that, we show that only model-related bias are amplified by quantization, impacting more low-resource languages and smaller models. Searching for a better compression approach, we propose DistilWhisper, an approach that is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities. Our approach involves two key strategies:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis

MethodsKnowledge Distillation