Generative Speech Recognition Error Correction with Large Language   Models and Task-Activating Prompting

Chao-Han Huck Yang; Yile Gu; Yi-Chieh Liu; Shalini Ghosh; Ivan Bulyko,; Andreas Stolcke

arXiv:2309.15649·cs.CL·January 29, 2024·2 cites

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko,, Andreas Stolcke

PDF

Open Access

TL;DR

This paper investigates using large language models as post-processors for speech recognition, employing various prompting techniques to improve error correction without extensive fine-tuning, achieving competitive and superior results.

Contribution

It introduces a novel task activation prompting method and demonstrates that LLMs can effectively perform speech recognition rescoring and error correction through prompting alone.

Findings

01

In-context learning with frozen LLMs achieves competitive rescoring results.

02

Combining prompting with fine-tuning surpasses N-best oracle error rates.

03

The proposed methods generalize well across out-of-domain speech recognition tasks.

Abstract

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsFocus