Analyzing the Importance of Blank for CTC-Based Knowledge Distillation
Benedikt Hilmes, Nick Rossenbach, Ralf Schl\"uter

TL;DR
This paper investigates how different blank token handling strategies in CTC-based knowledge distillation affect speech recognition model performance, proposing a symmetric selection method that enables label-independent training.
Contribution
It introduces a novel symmetric blank selection approach that removes the need for CTC loss during distillation, facilitating training on untranscribed audio data.
Findings
Blank elimination does not always improve performance.
The symmetric selection method maintains accuracy while removing CTC loss.
Training becomes independent of target labels, enabling untranscribed data use.
Abstract
With the rise of large pre-trained foundation models for automatic speech recognition new challenges appear. While the performance of these models is good, runtime and cost of inference increases. One approach to make use of their strength while retaining efficiency is to distill their knowledge to smaller models during training. In this work, we explore different CTC-based distillation variants, focusing on blank token handling. We show that common approaches like blank elimination do not always work off the shelf. We explore new blank selection patterns as a potential sweet spot between standard knowledge distillation and blank elimination mechanisms. Through the introduction of a symmetric selection method, we are able to remove the CTC loss during knowledge distillation with minimal to no performance degradation. With this, we make the training independent from target labels,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation · Connectionist Temporal Classification Loss
