Adaptable End-to-End ASR Models using Replaceable Internal LMs and   Residual Softmax

Keqi Deng; Philip C. Woodland

arXiv:2302.08579·eess.AS·March 16, 2023·1 cites

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

Keqi Deng, Philip C. Woodland

PDF

Open Access

TL;DR

This paper introduces a replaceable internal language model and a residual softmax to improve domain adaptation in end-to-end ASR models, enabling better performance on target domains without retraining.

Contribution

It proposes a novel RILM method for direct internal LM replacement and a R-softmax for domain adaptation in CTC-based models, addressing domain shift challenges.

Findings

01

2.6% absolute WER reduction on Switchboard

02

1.0% WER reduction on AESRC2020

03

Maintains intra-domain ASR performance

Abstract

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data. However, it still suffers from domain shifts from training to testing, and domain adaptation is still challenging. To alleviate this problem, this paper designs a replaceable internal language model (RILM) method, which makes it feasible to directly replace the internal language model (LM) of E2E ASR models with a target-domain LM in the decoding stage when a domain shift is encountered. Furthermore, this paper proposes a residual softmax (R-softmax) that is designed for CTC-based E2E ASR models to adapt to the target domain without re-training during inference. For E2E ASR models trained on the LibriSpeech corpus, experiments showed that the proposed methods gave a 2.6% absolute WER reduction on the Switchboard data and a 1.0% WER reduction on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSoftmax