Improving CTC-AED model with integrated-CTC and auxiliary loss   regularization

Daobin Zhu; Xiangdong Su; Hongbin Zhang

arXiv:2308.08449·cs.CL·August 17, 2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Daobin Zhu, Xiangdong Su, Hongbin Zhang

PDF

Open Access

TL;DR

This paper proposes an integrated-CTC approach with auxiliary loss regularization for speech recognition, improving convergence and accuracy by combining CTC and AED models through novel fusion methods.

Contribution

It introduces integrated-CTC with auxiliary loss regularization and two fusion methods, enhancing model convergence and performance in speech recognition tasks.

Findings

01

DAL method improves attention rescoring accuracy

02

PMP method enhances CTC prefix beam search and greedy search

03

Auxiliary loss accelerates model convergence

Abstract

Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP). We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC. To accelerate model convergence and improve accuracy, we introduce auxiliary loss regularization for accelerated convergence. Experimental results demonstrate that the DAL method performs better in attention rescoring, while the PMP method excels in CTC prefix beam search and greedy search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing