ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error   Correction

Victor Junqiu Wei; Weicheng Wang; Di Jiang; Yuanfeng Song; Lu Wang

arXiv:2412.03075·cs.CL·December 5, 2024

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

Victor Junqiu Wei, Weicheng Wang, Di Jiang, Yuanfeng Song, Lu Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces the first Chinese ASR error correction benchmark dataset and evaluates large language models' effectiveness in correcting ASR errors using various paradigms, highlighting multi-modal augmentation as the most effective approach.

Contribution

It creates the first Chinese ASR error correction benchmark and systematically investigates LLM-based correction methods, proposing multi-modal augmentation as the most effective technique.

Findings

01

Multi-modal augmentation outperforms prompting and finetuning.

02

Prompting methods are generally ineffective for ASR error correction.

03

Finetuning improves performance for some LLMs.

Abstract

Automatic speech Recognition (ASR) is a fundamental and important task in the field of speech and natural language processing. It is an inherent building block in many applications such as voice assistant, speech translation, etc. Despite the advancement of ASR technologies in recent years, it is still inevitable for modern ASR systems to have a substantial number of erroneous recognition due to environmental noise, ambiguity, etc. Therefore, the error correction in ASR is crucial. Motivated by this, this paper studies ASR error correction in the Chinese language, which is one of the most popular languages and enjoys a large number of users in the world. We first create a benchmark dataset named \emph{ASR-EC} that contains a wide spectrum of ASR errors generated by industry-grade ASR systems. To the best of our knowledge, it is the first Chinese ASR error correction benchmark. Then,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction· underline

Taxonomy

TopicsSpeech Recognition and Synthesis