Improving Readability for Automatic Speech Recognition Transcription
Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun, Shou, Hong Qu, Michael Zeng

TL;DR
This paper introduces a new NLP task called ASR post-processing for readability (APR), aiming to improve the clarity of automatic speech recognition transcripts for humans and downstream tasks, and proposes a synthesis-based training method and evaluation metrics.
Contribution
It defines the APR task, develops a data synthesis method using GEC, TTS, and ASR, and demonstrates that fine-tuned models significantly enhance transcript readability.
Findings
Fine-tuned models outperform traditional pipeline methods.
Synthetic data generation effectively trains APR models.
Proposed metrics provide meaningful evaluation of readability improvements.
Abstract
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
