Internal Language Model Estimation Through Explicit Context Vector   Learning for Attention-based Encoder-decoder ASR

Yufei Liu; Rao Ma; Haihua Xu; Yi He; Zejun Ma; Weibin Zhang

arXiv:2201.11627·eess.AS·November 3, 2022

Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR

Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang

PDF

Open Access

TL;DR

This paper introduces two novel methods for explicitly estimating the internal language model in attention-based encoder-decoder ASR systems, improving external language model integration and outperforming previous approaches.

Contribution

The paper proposes two new techniques for ILM estimation in LAS-based ASR, enhancing accuracy and external LM fusion effectiveness.

Findings

01

Achieved lowest perplexity for ILMs among tested methods.

02

Significantly outperformed shallow fusion and previous ILME approaches.

03

Validated on multiple datasets with consistent improvements.

Abstract

An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the training transcripts. To fuse an external LM using Bayes posterior theory, the log likelihood produced by the ILM has to be accurately estimated and subtracted. In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework. The first method is to replace the context vector of the LAS decoder at every time step with a vector that is learned with training transcripts. Furthermore, we propose another method that uses a lightweight feed-forward network to directly map query vector to context vector in a dynamic sense. Since the context vectors are learned by minimizing the perplexities on training transcripts, and their estimation is independent of encoder output, hence the ILMs are accurately learned for both methods. Experiments show that the ILMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques