Two-Pass End-to-End ASR Model Compression

Nauman Dawalatabad; Tushar Vatsal; Ashutosh Gupta; Sungsoo Kim,; Shatrughan Singh; Dhananjaya Gowda; Chanwoo Kim

arXiv:2201.02741·eess.AS·January 11, 2022

Two-Pass End-to-End ASR Model Compression

Nauman Dawalatabad, Tushar Vatsal, Ashutosh Gupta, Sungsoo Kim,, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim

PDF

Open Access

TL;DR

This paper presents a knowledge distillation approach to significantly compress two-pass end-to-end speech recognition models for small devices, maintaining high accuracy with 55% size reduction.

Contribution

It introduces a three-stage knowledge distillation method to effectively reduce the size of two-pass ASR models while preserving performance.

Findings

01

Achieves 55% model size reduction.

02

Maintains comparable WER to original models.

03

Demonstrates effectiveness on LibriSpeech dataset.

Abstract

Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-device speech recognition. In this work, we propose a simple and effective approach to reduce the size of the two-pass model for memory-constrained devices. We employ a popular knowledge distillation approach in three stages using the Teacher-Student training technique. In the first stage, we use a trained RNN-T model as a teacher model and perform knowledge distillation to train the student RNN-T model. The second stage uses the shared encoder and trains a LAS rescorer for student model using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsKnowledge Distillation