A context-aware knowledge transferring strategy for CTC-based ASR

Ke-Han Lu; Kuan-Yu Chen

arXiv:2210.06244·cs.CL·October 13, 2022

A context-aware knowledge transferring strategy for CTC-based ASR

Ke-Han Lu, Kuan-Yu Chen

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a context-aware knowledge transfer approach for CTC-based ASR that incorporates linguistic information from language models to overcome the independence assumption limitation, improving recognition performance.

Contribution

It proposes a novel knowledge transferring module and context-aware training strategy to enhance CTC-based ASR by integrating linguistic context from pre-trained language models.

Findings

01

Improved accuracy on AISHELL datasets

02

Effective mitigation of token independence assumption

03

Enhanced performance with knowledge injection

Abstract

Non-autoregressive automatic speech recognition (ASR) modeling has received increasing attention recently because of its fast decoding speed and superior performance. Among representatives, methods based on the connectionist temporal classification (CTC) are still a dominating stream. However, the theoretically inherent flaw, the assumption of independence between tokens, creates a performance barrier for the school of works. To mitigate the challenge, we propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR. The former is designed to distill linguistic information from a pre-trained language model, and the latter is framed to modulate the limitations caused by the conditional independence assumption. As a result, a knowledge-injected context-aware CTC-based ASR built upon the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kehanlu/mandarin-wav2vec2
pytorchOfficial

Models

🤗
kehanlu/mandarin-wav2vec2-aishell1
model· 7 dl· ♡ 2
7 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings