Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
Wanting Huang, Weiran Wang

TL;DR
This paper introduces Align-Consistency, a novel regularization technique for non-autoregressive ASR models that enhances accuracy and robustness through consistency across input perturbations, benefiting both supervised and semi-supervised learning.
Contribution
It extends consistency regularization to non-autoregressive, iterative refinement models, significantly improving recognition performance in supervised and semi-supervised ASR tasks.
Findings
CR applied to both base and refinement steps yields additive accuracy gains.
Align-Consistency boosts non-AR decoding performance.
Semi-supervised learning with pseudo-labels further improves results.
Abstract
Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
