Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George, Saon

TL;DR
This paper investigates the impact of large-scale language model rescoring on a top-tier ASR system, demonstrating consistent improvements through various LLM enhancements and analyzing their contributions.
Contribution
It introduces LLM rescoring into a competitive ASR baseline and analyzes how different LLM features improve ASR performance.
Findings
LLM rescoring yields consistent ASR improvements.
Bidirectionality, pretraining, and in-domain finetuning enhance rescoring effectiveness.
Lexical analysis reveals the contribution of each LLM component.
Abstract
Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Cosine Annealing · Byte Pair Encoding · Dense Connections · Attention Dropout · Linear Warmup With Cosine Annealing · Adam
