Effect and Analysis of Large-scale Language Model Rescoring on   Competitive ASR Systems

Takuma Udagawa; Masayuki Suzuki; Gakuto Kurata; Nobuyasu Itoh; George; Saon

arXiv:2204.00212·cs.CL·August 19, 2022

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George, Saon

PDF

Open Access

TL;DR

This paper investigates the impact of large-scale language model rescoring on a top-tier ASR system, demonstrating consistent improvements through various LLM enhancements and analyzing their contributions.

Contribution

It introduces LLM rescoring into a competitive ASR baseline and analyzes how different LLM features improve ASR performance.

Findings

01

LLM rescoring yields consistent ASR improvements.

02

Bidirectionality, pretraining, and in-domain finetuning enhance rescoring effectiveness.

03

Lexical analysis reveals the contribution of each LLM component.

Abstract

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Cosine Annealing · Byte Pair Encoding · Dense Connections · Attention Dropout · Linear Warmup With Cosine Annealing · Adam