Neural-FST Class Language Model for End-to-End Speech Recognition
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo, Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

TL;DR
This paper introduces NFCLM, a novel speech recognition model combining neural networks and FSTs, improving accuracy and compactness for on-device applications.
Contribution
The paper presents a new framework that integrates neural language models with FSTs using a neural decider, enhancing performance and efficiency.
Findings
NFCLM outperforms NNLM by 15.8% relative WER reduction.
NFCLM matches traditional NNLM and FST fusion performance.
NFCLM is 12 times more compact, suitable for on-device use.
Abstract
We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. Our method utilizes a background NNLM which models generic background text together with a collection of domain-specific entities modeled as individual FSTs. Each output token is generated by a mixture of these components; the mixture weights are estimated with a separately trained neural decider. We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate. NFCLM achieves similar performance as traditional NNLM and FST shallow fusion while being less prone to overbiasing and 12 times more compact, making it more suitable for on-device usage.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
