CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han, Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey

TL;DR
This paper introduces CR-CTC, a novel regularization technique for CTC-based speech recognition that enforces consistency between augmented views, leading to improved performance and state-of-the-art results on multiple datasets.
Contribution
It proposes a new consistency regularization method for CTC that enhances speech recognition accuracy by reducing overfitting and improving generalization.
Findings
CR-CTC significantly improves CTC performance on multiple datasets.
Achieves state-of-the-art results comparable to transducer and CTC/AED systems.
Effectively reduces overfitting and enhances model generalization.
Abstract
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. We provide in-depth insights into its essential behaviors from three perspectives: 1) it conducts self-distillation between random pairs of sub-models that process different augmented views; 2) it learns contextual representation through masked prediction for positions within time-masked regions, especially when we increase the amount of time masking; 3) it suppresses the extremely peaky CTC distributions, thereby reducing overfitting and improving the generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques
