Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim, Hantae Kim, Kyogu Lee

TL;DR
This paper proposes a self-knowledge distillation approach for transformer-based ASR models that improves frame-level alignment and overall performance by sharing encoder layers and reducing alignment disagreement.
Contribution
It introduces a novel self-knowledge distillation method that guides frame-level alignments without separate teacher-student models, enhancing efficiency and accuracy.
Findings
Improved ASR performance with reduced alignment disagreement.
Enhanced resource efficiency through shared encoder layers.
Effective in guiding frame-level alignments during training.
Abstract
Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Photolithography Techniques
MethodsKnowledge Distillation
