Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

TL;DR
This paper introduces a multi-stage knowledge distillation method to compress conformer transducer models for on-device speech recognition, achieving over 60% size reduction with minimal performance loss.
Contribution
It proposes a novel multi-stage progressive compression approach using knowledge distillation for conformer transducer models in speech recognition.
Findings
Achieved over 60% model size reduction.
Maintained performance close to larger models.
Validated on LibriSpeech dataset.
Abstract
The smaller memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models. To obtain a smaller model, one can employ the model compression techniques. Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size with relatively lesser degradation in the model performance. In this approach, knowledge is distilled from a trained large size teacher model to a smaller size student model. Also, the transducer based models have recently shown to perform well for on-device streaming ASR task, while the conformer models are efficient in handling long term dependencies. Hence in this work we employ a streaming transducer architecture with conformer as the encoder. We propose a multi-stage progressive approach to compress the conformer transducer model using KD. We progressively update our teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
