Leveraging Large Language Models for Exploiting ASR Uncertainty

Pranay Dighe; Yi Su; Shangshang Zheng; Yunshu Liu; Vineet Garg,; Xiaochuan Niu; Ahmed Tewfik

arXiv:2309.04842·cs.CL·September 13, 2023

Leveraging Large Language Models for Exploiting ASR Uncertainty

Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg,, Xiaochuan Niu, Ahmed Tewfik

PDF

Open Access

TL;DR

This paper demonstrates that prompting large language models with n-best ASR hypotheses improves speech intent classification and keyword spotting, effectively exploiting ASR uncertainty without changing core models.

Contribution

It introduces a method of using n-best ASR hypotheses as prompts for LLMs, enhancing speech understanding performance without modifying the underlying models.

Findings

01

n-best list prompts outperform 1-best hypotheses in speech tasks

02

Prompt engineering and fine-tuning improve LLM performance on spoken language understanding

03

Approach is effective on device-directed speech detection and keyword spotting

Abstract

While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the accuracy of a fixed ASR system on the spoken input. Specifically, we tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. Instead of chasing a high accuracy by designing complex or specialized architectures regardless of deployment costs, we seek to answer how far we can go without substantially changing the underlying ASR and LLM, which can potentially be shared by multiple unrelated tasks. To this end, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis