Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented   Conversational AI

Yuya Asano; Sabit Hassan; Paras Sharma; Anthony Sicilia; Katherine; Atwell; Diane Litman; Malihe Alikhani

arXiv:2501.06129·cs.CL·January 13, 2025

Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI

Yuya Asano, Sabit Hassan, Paras Sharma, Anthony Sicilia, Katherine, Atwell, Diane Litman, Malihe Alikhani

PDF

Open Access

TL;DR

This paper introduces a novel method for improving goal-oriented conversational AI's speech recognition accuracy by augmenting context with large language models and ranking hypotheses, outperforming existing correction methods.

Contribution

The paper presents a new context augmentation and ranking strategy using large language models that enhances ASR correction without prior user data, applicable to linguistically flexible tasks.

Findings

01

34% improvement in recall

02

16% improvement in F1 score

03

Higher user satisfaction ratings

Abstract

General-purpose automatic speech recognition (ASR) systems do not always perform well in goal-oriented dialogue. Existing ASR correction methods rely on prior user data or named entities. We extend correction to tasks that have no prior user data and exhibit linguistic flexibility such as lexical and syntactic variations. We propose a novel context augmentation with a large language model and a ranking strategy that incorporates contextual information from the dialogue states of a goal-oriented conversational AI and its tasks. Our method ranks (1) n-best ASR hypotheses by their lexical and semantic similarity with context and (2) context by phonetic correspondence with ASR hypotheses. Evaluated in home improvement and cooking domains with real-world users, our method improves recall and F1 of correction by 34% and 16%, respectively, while maintaining precision and false positive rate.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning · Human-Automation Interaction and Safety