Adaptive Discounting of Implicit Language Models in RNN-Transducers

Vinit Unni; Shreya Khare; Ashish Mittal; Preethi Jyothi; Sunita; Sarawagi; Samarth Bharadwaj

arXiv:2203.02317·cs.CL·March 7, 2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

Vinit Unni, Shreya Khare, Ashish Mittal, Preethi Jyothi, Sunita, Sarawagi, Samarth Bharadwaj

PDF

Open Access

TL;DR

This paper introduces AdaptLMD, a lightweight adaptive discounting method for RNN-Transducer models that improves rare word recognition by dynamically reducing overconfidence of the internal language model without external resources.

Contribution

The paper proposes AdaptLMD, a novel, resource-efficient technique to mitigate overconfidence in RNN-T models' language component, enhancing rare word recognition performance.

Findings

01

Up to 4% relative WER reduction on Hindi-English ASR

02

Up to 14% relative rare word PER reduction

03

Effective without external resources or additional parameters

Abstract

RNN-Transducer (RNN-T) models have become synonymous with streaming end-to-end ASR systems. While they perform competitively on a number of evaluation categories, rare words pose a serious challenge to RNN-T models. One main reason for the degradation in performance on rare words is that the language model (LM) internal to RNN-Ts can become overconfident and lead to hallucinated predictions that are acoustically inconsistent with the underlying speech. To address this issue, we propose a lightweight adaptive LM discounting technique AdaptLMD, that can be used with any RNN-T architecture without requiring any external resources or additional parameters. AdaptLMD uses a two-pronged approach: 1) Randomly mask the prediction network output to encourage the RNN-T to not be overly reliant on it's outputs. 2) Dynamically choose when to discount the implicit LM (ILM) based on rarity of recently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems