Applying LLMs for Rescoring N-best ASR Hypotheses of Casual   Conversations: Effects of Domain Adaptation and Context Carry-over

Atsunori Ogawa; Naoyuki Kamo; Kohei Matsuura; Takanori Ashihara,; Takafumi Moriya; Takatomo Kano; Naohiro Tawara; Marc Delcroix

arXiv:2406.18972·eess.AS·June 28, 2024·2 cites

Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over

Atsunori Ogawa, Naoyuki Kamo, Kohei Matsuura, Takanori Ashihara,, Takafumi Moriya, Takatomo Kano, Naohiro Tawara, Marc Delcroix

PDF

Open Access

TL;DR

This paper explores the use of Llama2, a large language model, for rescoring N-best ASR hypotheses in casual conversations, highlighting the effects of domain adaptation and context carry-over on performance and computational efficiency.

Contribution

It demonstrates the effectiveness of Llama2 in casual conversation ASR rescoring and analyzes how domain adaptation and context length influence performance and cost.

Findings

01

Llama2 outperforms a standard domain-adapted Transformer-LM without adaptation.

02

Domain adaptation reduces the context length needed for optimal performance.

03

Longer context improves Llama2's rescoring accuracy.

Abstract

Large language models (LLMs) have been successfully applied for rescoring automatic speech recognition (ASR) hypotheses. However, their ability to rescore ASR hypotheses of casual conversations has not been sufficiently explored. In this study, we reveal it by performing N-best ASR hypotheses rescoring using Llama2 on the CHiME-7 distant ASR (DASR) task. Llama2 is one of the most representative LLMs, and the CHiME-7 DASR task provides datasets of casual conversations between multiple participants. We investigate the effects of domain adaptation of the LLM and context carry-over when performing N-best rescoring. Experimental results show that, even without domain adaptation, Llama2 outperforms a standard-size domain-adapted Transformer-LM, especially when using a long context. Domain adaptation shortens the context length needed with Llama2 to achieve its best performance, i.e., it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling