Full-text Error Correction for Chinese Speech Recognition with Large Language Model
Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

TL;DR
This paper explores the use of large language models for correcting errors in full-text transcriptions from long speech recordings in Chinese, creating a new dataset and evaluating different prompt strategies.
Contribution
It introduces ChFT, a new Chinese full-text error correction dataset, and demonstrates effective fine-tuning of LLMs for comprehensive error correction in long speech transcripts.
Findings
LLMs perform well with different prompts in full-text correction
The ChFT dataset enables correction of diverse error types
Prompt design impacts correction performance
Abstract
Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). However, most research focuses on utterances from short-duration speech recordings, which are the predominant form of speech data for supervised ASR training. This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings. First, we develop a Chinese dataset for full-text error correction, named ChFT, utilizing a pipeline that involves text-to-speech synthesis, ASR, and error-correction pair extractor. This dataset enables us to correct errors across contexts, including both full-text and segment, and to address a broader range of error types, such as punctuation restoration and inverse text normalization, thus making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSparse Evolutionary Training
