Towards interfacing large language models with ASR systems using   confidence measures and prompting

Maryam Naderi; Enno Hermann; Alexandre Nanchen; Sevada Hovsepyan,; Mathew Magimai.-Doss

arXiv:2407.21414·eess.AS·September 25, 2024

Towards interfacing large language models with ASR systems using confidence measures and prompting

Maryam Naderi, Enno Hermann, Alexandre Nanchen, Sevada Hovsepyan,, Mathew Magimai.-Doss

PDF

Open Access

TL;DR

This paper explores using large language models to improve automatic speech recognition outputs through confidence-based filtering and prompting, enhancing less accurate ASR systems.

Contribution

It introduces confidence-based filtering methods for post-hoc correction of ASR transcripts using LLMs, which is a novel approach for improving ASR performance.

Findings

01

Confidence filtering improves ASR accuracy

02

LLMs effectively correct transcripts when guided by confidence measures

03

Enhanced performance for less competitive ASR systems

Abstract

As large language models (LLMs) grow in parameter size and capabilities, such as interaction through prompting, they open up new ways of interfacing with automatic speech recognition (ASR) systems beyond rescoring n-best lists. This work investigates post-hoc correction of ASR transcripts with LLMs. To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems