Exploring Methods for the Automatic Detection of Errors in Manual Transcription
Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky

TL;DR
This paper investigates automatic methods for detecting errors in manual speech transcriptions, aiming to improve data quality for deep learning by leveraging acoustic and language model approaches evaluated on real datasets.
Contribution
It introduces a novel acoustic model-based approach for error detection in transcriptions, complementing existing language model methods, and evaluates both on real, error-containing datasets.
Findings
Acoustic model approach effectively detects transcription errors.
Language model approach relies on transcription-dependent bias.
Combined methods improve error detection accuracy.
Abstract
Quality of data plays an important role in most deep learning tasks. In the speech community, transcription of speech recording is indispensable. Since the transcription is usually generated artificially, automatically finding errors in manual transcriptions not only saves time and labors but benefits the performance of tasks that need the training process. Inspired by the success of hybrid automatic speech recognition using both language model and acoustic model, two approaches of automatic error detection in the transcriptions have been explored in this work. Previous study using a biased language model approach, relying on a strong transcription-dependent language model, has been reviewed. In this work, we propose a novel acoustic model based approach, focusing on the phonetic sequence of speech. Both methods have been evaluated on a completely real dataset, which was originally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
