Improving Readability for Automatic Speech Recognition Transcription

Junwei Liao; Sefik Emre Eskimez; Liyang Lu; Yu Shi; Ming Gong; Linjun; Shou; Hong Qu; Michael Zeng

arXiv:2004.04438·cs.CL·April 10, 2020·19 cites

Improving Readability for Automatic Speech Recognition Transcription

Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun, Shou, Hong Qu, Michael Zeng

PDF

Open Access

TL;DR

This paper introduces a new NLP task called ASR post-processing for readability (APR), aiming to improve the clarity of automatic speech recognition transcripts for humans and downstream tasks, and proposes a synthesis-based training method and evaluation metrics.

Contribution

It defines the APR task, develops a data synthesis method using GEC, TTS, and ASR, and demonstrates that fine-tuned models significantly enhance transcript readability.

Findings

01

Fine-tuned models outperform traditional pipeline methods.

02

Synthetic data generation effectively trains APR models.

03

Proposed metrics provide meaningful evaluation of readability improvements.

Abstract

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification