Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Daobin Zhu, Xiangdong Su, Hongbin Zhang

TL;DR
This paper proposes an integrated-CTC approach with auxiliary loss regularization for speech recognition, improving convergence and accuracy by combining CTC and AED models through novel fusion methods.
Contribution
It introduces integrated-CTC with auxiliary loss regularization and two fusion methods, enhancing model convergence and performance in speech recognition tasks.
Findings
DAL method improves attention rescoring accuracy
PMP method enhances CTC prefix beam search and greedy search
Auxiliary loss accelerates model convergence
Abstract
Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP). We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC. To accelerate model convergence and improve accuracy, we introduce auxiliary loss regularization for accelerated convergence. Experimental results demonstrate that the DAL method performs better in attention rescoring, while the PMP method excels in CTC prefix beam search and greedy search.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
