Full-text Error Correction for Chinese Speech Recognition with Large   Language Model

Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang

arXiv:2409.07790·cs.CL·December 24, 2024

Full-text Error Correction for Chinese Speech Recognition with Large Language Model

Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

PDF

Open Access 1 Datasets

TL;DR

This paper explores the use of large language models for correcting errors in full-text transcriptions from long speech recordings in Chinese, creating a new dataset and evaluating different prompt strategies.

Contribution

It introduces ChFT, a new Chinese full-text error correction dataset, and demonstrates effective fine-tuning of LLMs for comprehensive error correction in long speech transcripts.

Findings

01

LLMs perform well with different prompts in full-text correction

02

The ChFT dataset enables correction of diverse error types

03

Prompt design impacts correction performance

Abstract

Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). However, most research focuses on utterances from short-duration speech recordings, which are the predominant form of speech data for supervised ASR training. This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings. First, we develop a Chinese dataset for full-text error correction, named ChFT, utilizing a pipeline that involves text-to-speech synthesis, ASR, and error-correction pair extractor. This dataset enables us to correct errors across contexts, including both full-text and segment, and to address a broader range of error types, such as punctuation restoration and inverse text normalization, thus making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tzyll/ChFT
dataset· 65 dl
65 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training