Pinyin Regularization in Error Correction for Chinese Speech Recognition   with Large Language Models

Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang

arXiv:2407.01909·cs.CL·September 25, 2024

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Chinese-specific benchmark dataset for ASR error correction, proposes Pinyin regularization to improve LLM performance, and demonstrates its effectiveness through experiments.

Contribution

It creates the Chinese Hypotheses Paradise dataset and proposes Pinyin regularization to enhance LLM-based Chinese ASR error correction.

Findings

01

Pinyin regularization improves error correction accuracy

02

The dataset contains 724K hypotheses-transcription pairs

03

Regularization consistently outperforms non-regularized methods

Abstract

Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tzyll/ChineseHP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need