Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang, Yuke Li, Binbin Du

TL;DR
This paper introduces Gated Interlayer Collaboration to enhance CTC-based speech recognition models by integrating textual information, thereby improving performance without external language models.
Contribution
The paper proposes a novel GIC mechanism that incorporates textual information into CTC models using inter-layer auxiliary losses and gating, relaxing the independence assumption.
Findings
Outperforms strong baselines on AISHELL-1, TEDLIUM2, and AIDATATANG datasets.
Effectively integrates textual info to improve recognition accuracy.
Demonstrates the benefit of inter-layer collaboration in CTC models.
Abstract
The CTC-based automatic speech recognition (ASR) models without the external language model usually lack the capacity to model conditional dependencies and textual interactions. In this paper, we present a Gated Interlayer Collaboration (GIC) mechanism to improve the performance of CTC-based models, which introduces textual information into the model and thus relaxes the conditional independence assumption of CTC-based models. Specifically, we consider the weighted sum of token embeddings as the textual representation for each position, where the position-specific weights are the softmax probability distribution constructed via inter-layer auxiliary CTC losses. The textual representations are then fused with acoustic features by developing a gate unit. Experiments on AISHELL-1, TEDLIUM2, and AIDATATANG corpora show that the proposed method outperforms several strong baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Graph InfoClust
