Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
Longhao Li, Yangze Li, Hongfei Xue, Jie Liu, Shuai Fang, Kai Wang, Lei Xie

TL;DR
This paper introduces Delayed-KD, a novel method for low-latency streaming ASR that uses delayed knowledge distillation and a Temporal Alignment Buffer to improve accuracy and control token emission delay.
Contribution
The paper proposes Delayed-KD, a new approach applying delayed knowledge distillation with a Temporal Alignment Buffer to enhance streaming ASR performance.
Findings
Delved-KD achieves lower CER at 40 ms latency, comparable to models with higher latency.
The method improves accuracy in small chunks and reduces token emission delay.
Experiments on AISHELL-1 and WenetSpeech datasets demonstrate consistent improvements.
Abstract
CTC-based streaming ASR has gained significant attention in real-world applications but faces two main challenges: accuracy degradation in small chunks and token emission latency. To mitigate these challenges, we propose Delayed-KD, which applies delayed knowledge distillation on CTC posterior probabilities from a non-streaming to a streaming model. Specifically, with a tiny chunk size, we introduce a Temporal Alignment Buffer (TAB) that defines a relative delay range compared to the non-streaming teacher model to align CTC outputs and mitigate non-blank token mismatches. Additionally, TAB enables fine-grained control over token emission delay. Experiments on 178-hour AISHELL-1 and 10,000-hour WenetSpeech Mandarin datasets show consistent superiority of Delayed-KD. Impressively, Delayed-KD at 40 ms latency achieves a lower character error rate (CER) of 5.42% on AISHELL-1, comparable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Fault Detection and Control Systems · Neural Networks and Applications
