Adaptive Discounting of Implicit Language Models in RNN-Transducers
Vinit Unni, Shreya Khare, Ashish Mittal, Preethi Jyothi, Sunita, Sarawagi, Samarth Bharadwaj

TL;DR
This paper introduces AdaptLMD, a lightweight adaptive discounting method for RNN-Transducer models that improves rare word recognition by dynamically reducing overconfidence of the internal language model without external resources.
Contribution
The paper proposes AdaptLMD, a novel, resource-efficient technique to mitigate overconfidence in RNN-T models' language component, enhancing rare word recognition performance.
Findings
Up to 4% relative WER reduction on Hindi-English ASR
Up to 14% relative rare word PER reduction
Effective without external resources or additional parameters
Abstract
RNN-Transducer (RNN-T) models have become synonymous with streaming end-to-end ASR systems. While they perform competitively on a number of evaluation categories, rare words pose a serious challenge to RNN-T models. One main reason for the degradation in performance on rare words is that the language model (LM) internal to RNN-Ts can become overconfident and lead to hallucinated predictions that are acoustically inconsistent with the underlying speech. To address this issue, we propose a lightweight adaptive LM discounting technique AdaptLMD, that can be used with any RNN-T architecture without requiring any external resources or additional parameters. AdaptLMD uses a two-pronged approach: 1) Randomly mask the prediction network output to encourage the RNN-T to not be overly reliant on it's outputs. 2) Dynamically choose when to discount the implicit LM (ILM) based on rarity of recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
