Neural-FST Class Language Model for End-to-End Speech Recognition

Antoine Bruguier; Duc Le; Rohit Prabhavalkar; Dangna Li; Zhe Liu; Bo; Wang; Eun Chang; Fuchun Peng; Ozlem Kalinli; Michael L. Seltzer

arXiv:2201.11867·cs.CL·February 1, 2022

Neural-FST Class Language Model for End-to-End Speech Recognition

Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo, Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

PDF

Open Access

TL;DR

This paper introduces NFCLM, a novel speech recognition model combining neural networks and FSTs, improving accuracy and compactness for on-device applications.

Contribution

The paper presents a new framework that integrates neural language models with FSTs using a neural decider, enhancing performance and efficiency.

Findings

01

NFCLM outperforms NNLM by 15.8% relative WER reduction.

02

NFCLM matches traditional NNLM and FST fusion performance.

03

NFCLM is 12 times more compact, suitable for on-device use.

Abstract

We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. Our method utilizes a background NNLM which models generic background text together with a collection of domain-specific entities modeled as individual FSTs. Each output token is generated by a mixture of these components; the mixture weights are estimated with a separately trained neural decider. We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate. NFCLM achieves similar performance as traditional NNLM and FST shallow fusion while being less prone to overbiasing and 12 times more compact, making it more suitable for on-device usage.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems