Label-Context-Dependent Internal Language Model Estimation for CTC

Zijian Yang; Minh-Nghia Phan; Ralf Schl\"uter; Hermann Ney

arXiv:2506.06096·cs.SD·June 9, 2025

Label-Context-Dependent Internal Language Model Estimation for CTC

Zijian Yang, Minh-Nghia Phan, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper investigates how CTC implicitly learns context-dependent language models and proposes novel methods to estimate and improve these models using knowledge distillation, leading to significant WER improvements.

Contribution

It introduces new context-dependent ILM estimation techniques for CTC based on knowledge distillation, with theoretical support and effective regularization methods.

Findings

01

Context-dependent ILMs outperform context-independent priors in cross-domain tasks.

02

Proposed label-level KD with smoothing surpasses other ILM estimation methods.

03

Achieves over 13% relative WER reduction compared to shallow fusion.

Abstract

Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Authorship Attribution and Profiling

MethodsKnowledge Distillation