Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Ernest Pusateri; Anmol Walia; Anirudh Kashi; Bortik Bandyopadhyay,; Nadia Hyder; Sayantan Mahinder; Raviteja Anantha; Daben Liu; Sashank Gondala

arXiv:2409.06062·eess.AS·September 11, 2024

Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Ernest Pusateri, Anmol Walia, Anirudh Kashi, Bortik Bandyopadhyay,, Nadia Hyder, Sayantan Mahinder, Raviteja Anantha, Daben Liu, Sashank Gondala

PDF

Open Access

TL;DR

This paper introduces a retrieval-augmented method that uses a vector database and large language models to correct entity name errors in speech recognition, significantly reducing error rates for rare entities.

Contribution

It presents a novel retrieval-augmented correction technique for ASR errors in entity names, leveraging LLMs and vector databases to improve accuracy.

Findings

01

Achieves 33%-39% relative WER reduction on synthetic test sets.

02

No regression observed on the STOP voice assistant test set.

03

Effective correction of rare music entity names.

Abstract

In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language processing (NLP) tasks. In NLP tasks where a database of relevant knowledge is available, retrieval augmented generation (RAG) has achieved impressive results when used with LLMs. In this work, we propose a RAG-like technique for correcting speech recognition entity name errors. Our approach uses a vector database to index a set of relevant entities. At runtime, database queries are generated from possibly errorful textual ASR hypotheses, and the entities retrieved using these queries are fed, along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training