Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying   Peak-First Regularization

Zhengkun Tian; Hongyu Xiang; Min Li; Feifei Lin; Ke Ding; Guanglu Wan

arXiv:2211.03284·eess.AS·March 17, 2023·1 cites

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan

PDF

Open Access

TL;DR

This paper introduces a simple peak-first regularization method for CTC models that shifts predicted peaks earlier, significantly reducing recognition latency without sacrificing accuracy.

Contribution

The paper proposes a novel regularization technique that encourages earlier peak predictions in CTC models by using frame-wise knowledge distillation, avoiding complex modifications of the loss function.

Findings

01

Reduces peak latency by 100-200 ms

02

Maintains recognition accuracy

03

Effective for both streaming and non-streaming models

Abstract

The CTC model has been widely applied to many application scenarios because of its simple structure, excellent performance, and fast inference speed. There are many peaks in the probability distribution predicted by the CTC models, and each peak represents a non-blank token. The recognition latency of CTC models can be reduced by encouraging the model to predict peaks earlier. Existing methods to reduce latency require modifying the transition relationship between tokens in the forward-backward algorithm, and the gradient calculation. Some of these methods even depend on the forced alignment results provided by other pretrained models. The above methods are complex to implement. To reduce the peak latency, we propose a simple and novel method named peak-first regularization, which utilizes a frame-wise knowledge distillation function to force the probability distribution of the CTC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsConnectionist Temporal Classification Loss · Knowledge Distillation