Human Transcription Quality Improvement

Jian Gao; Hanbo Sun; Cheng Cao; Zheng Du

arXiv:2309.14372·cs.CL·September 27, 2023

Human Transcription Quality Improvement

Jian Gao, Hanbo Sun, Cheng Cao, Zheng Du

PDF

Open Access 1 Repo

TL;DR

This paper presents a reliable method for improving speech transcription quality through confidence-based reprocessing and automatic error correction, resulting in significant reductions in word error rate and benefiting ASR training.

Contribution

It introduces a novel transcription quality enhancement approach and releases a large-scale dataset, addressing cost and quality issues in speech data collection.

Findings

01

Transcription WER reduced by over 50%

02

Transcription quality correlates strongly with ASR performance

03

Provides a new dataset and tools for the research community

Abstract

High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to researchers, while the quality of crowdsourced transcription is low. In this paper, we propose a reliable method to collect speech transcriptions. We introduce two mechanisms to improve transcription quality: confidence estimation based reprocessing at labeling stage, and automatic word error correction at post-labeling stage. We collect and release LibriCrowd - a large-scale crowdsourced dataset of audio transcriptions on 100 hours of English speech. Experiment shows the Transcription WER is reduced by over 50%. We further investigate the impact of transcription error on ASR model performance and found a strong correlation. The transcription quality improvement provides over 10% relative WER reduction for ASR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GenerateAI/LibriCrowd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing