Improving CTC-based speech recognition via knowledge transferring from   pre-trained language models

Keqi Deng; Songjun Cao; Yike Zhang; Long Ma; Gaofeng Cheng; Ji Xu,; Pengyuan Zhang

arXiv:2203.03582·cs.CL·March 8, 2022

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu,, Pengyuan Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces two novel methods for transferring knowledge from pre-trained language models like BERT and GPT2 to enhance CTC-based speech recognition models, significantly reducing error rates without external LMs.

Contribution

The paper proposes two innovative knowledge transfer techniques from pre-trained LMs to improve CTC-based speech recognition, addressing their inherent weaknesses.

Findings

01

Achieved a CER of 4.2% on AISHELL-1 test set.

02

Reduced CER by 16.1% relative compared to vanilla CTC models.

03

Improved speech recognition performance without external language models.

Abstract

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this issue, we propose two knowledge transferring methods that leverage pre-trained LMs, such as BERT and GPT2, to improve CTC-based models. The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target. The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture. Experiment on AISHELL-1 corpus yields a character error rate (CER)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Vladimetr/ASR-Knowledge-Transferring
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Dense Connections · Residual Connection