Leave No Knowledge Behind During Knowledge Distillation: Towards   Practical and Effective Knowledge Distillation for Code-Switching ASR Using   Realistic Data

Liang-Hsuan Tseng; Zih-Ching Chen; Wei-Shun Chang; Cheng-Kuang Lee,; Tsung-Ren Huang; Hung-yi Lee

arXiv:2407.10603·eess.AS·July 16, 2024

Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data

Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee,, Tsung-Ren Huang, Hung-yi Lee

PDF

Open Access 2 Models

TL;DR

This paper introduces K²D, a knowledge distillation framework that creates smaller, faster, and more effective code-switching ASR models using realistic data and auxiliary insights, outperforming baselines.

Contribution

The paper proposes K²D, a novel knowledge distillation method that leverages both teacher knowledge and auxiliary models for practical code-switching ASR.

Findings

01

K²D produces models twice as small and five times faster.

02

K²D outperforms baseline methods and the teacher model.

03

The approach is validated on multiple in-domain and out-domain datasets.

Abstract

Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing more efficient models for CS-ASR through knowledge distillation using realistic speech-only data. Our proposed method, Leave No Knowledge Behind During Knowledge Distillation (K $^{2}$ D), leverages both the teacher model's knowledge and additional insights from a small auxiliary model. We evaluate our approach on two in-domain and two out-domain datasets, demonstrating that K $^{2}$ D is effective. By conducting K $^{2}$ D on the unlabeled realistic data, we have successfully obtained a 2-time smaller model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Ferroelectric and Negative Capacitance Devices

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation