ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
Victor Junqiu Wei, Weicheng Wang, Di Jiang, Yuanfeng Song, Lu Wang

TL;DR
This paper introduces the first Chinese ASR error correction benchmark dataset and evaluates large language models' effectiveness in correcting ASR errors using various paradigms, highlighting multi-modal augmentation as the most effective approach.
Contribution
It creates the first Chinese ASR error correction benchmark and systematically investigates LLM-based correction methods, proposing multi-modal augmentation as the most effective technique.
Findings
Multi-modal augmentation outperforms prompting and finetuning.
Prompting methods are generally ineffective for ASR error correction.
Finetuning improves performance for some LLMs.
Abstract
Automatic speech Recognition (ASR) is a fundamental and important task in the field of speech and natural language processing. It is an inherent building block in many applications such as voice assistant, speech translation, etc. Despite the advancement of ASR technologies in recent years, it is still inevitable for modern ASR systems to have a substantial number of erroneous recognition due to environmental noise, ambiguity, etc. Therefore, the error correction in ASR is crucial. Motivated by this, this paper studies ASR error correction in the Chinese language, which is one of the most popular languages and enjoys a large number of users in the world. We first create a benchmark dataset named \emph{ASR-EC} that contains a wide spectrum of ASR errors generated by industry-grade ASR systems. To the best of our knowledge, it is the first Chinese ASR error correction benchmark. Then,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis
