Towards interfacing large language models with ASR systems using confidence measures and prompting
Maryam Naderi, Enno Hermann, Alexandre Nanchen, Sevada Hovsepyan,, Mathew Magimai.-Doss

TL;DR
This paper explores using large language models to improve automatic speech recognition outputs through confidence-based filtering and prompting, enhancing less accurate ASR systems.
Contribution
It introduces confidence-based filtering methods for post-hoc correction of ASR transcripts using LLMs, which is a novel approach for improving ASR performance.
Findings
Confidence filtering improves ASR accuracy
LLMs effectively correct transcripts when guided by confidence measures
Enhanced performance for less competitive ASR systems
Abstract
As large language models (LLMs) grow in parameter size and capabilities, such as interaction through prompting, they open up new ways of interfacing with automatic speech recognition (ASR) systems beyond rescoring n-best lists. This work investigates post-hoc correction of ASR transcripts with LLMs. To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
