DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma

TL;DR
This paper introduces DistillW2V2, a two-stage knowledge distillation approach that creates a smaller, streaming Wav2vec 2.0 ASR model, significantly reducing size and latency while maintaining acceptable accuracy on multiple datasets.
Contribution
It proposes a novel two-stage distillation method to produce a small, streaming W2V2-based ASR model from a large non-streaming teacher, enabling broader application.
Findings
DistillW2V2 is 8x faster and 12x smaller than the original model.
Achieves 9% to 23.4% WER degradation at 480ms latency.
Effective across Gigaspeech, Librispeech, and in-house datasets.
Abstract
Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and the second step is to make it streaming. Specially, we adopt the MSE loss for the distillation of hidden layers and the modified LF-MMI loss for the distillation of the prediction layer. Experiments are conducted on Gigaspeech, Librispeech, and an in-house dataset. The results show that the distilled student model (DistillW2V2) we finally get is 8x faster and 12x smaller than the original teacher model. For the 480ms latency setup, the DistillW2V2's relative word error rate (WER) degradation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
MethodsTest · Knowledge Distillation
