External Language Model Integration for Factorized Neural Transducers

Michael Levit; Sarangarajan Parthasarathy; Cem Aksoylar; Mohammad; Sadegh Rasooli; Shuangyu Chang

arXiv:2305.17304·cs.CL·May 30, 2023·1 cites

External Language Model Integration for Factorized Neural Transducers

Michael Levit, Sarangarajan Parthasarathy, Cem Aksoylar, Mohammad, Sadegh Rasooli, Shuangyu Chang

PDF

Open Access

TL;DR

This paper introduces an adaptation method for factorized neural transducers that effectively integrates external language models, significantly improving speech recognition accuracy through linear interpolation and class-based n-gram models.

Contribution

It presents a novel approach for integrating external language models into FNT, demonstrating substantial accuracy gains over previous methods.

Findings

01

Linear interpolation of external LMs outperforms shallow fusion.

02

Class-based n-gram models improve FNT accuracy.

03

Up to 60% WERR gain in entity-rich scenarios.

Abstract

We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-based n-gram language models into FNT framework resulting in accuracy gains similar to a hybrid setup. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario through a combination of class-based n-gram and neural LMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis