Label-Context-Dependent Internal Language Model Estimation for CTC
Zijian Yang, Minh-Nghia Phan, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper investigates how CTC implicitly learns context-dependent language models and proposes novel methods to estimate and improve these models using knowledge distillation, leading to significant WER improvements.
Contribution
It introduces new context-dependent ILM estimation techniques for CTC based on knowledge distillation, with theoretical support and effective regularization methods.
Findings
Context-dependent ILMs outperform context-independent priors in cross-domain tasks.
Proposed label-level KD with smoothing surpasses other ILM estimation methods.
Achieves over 13% relative WER reduction compared to shallow fusion.
Abstract
Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Authorship Attribution and Profiling
MethodsKnowledge Distillation
