Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu,, Hong Qu, Michael Zeng

TL;DR
This paper introduces a post-processing model using a fine-tuned RoBERTa to improve the readability of ASR transcripts, significantly reducing errors and enhancing human and downstream task usability.
Contribution
It presents a novel data augmentation and two-stage training approach for fine-tuning a pre-trained language model to produce more human-readable ASR transcripts.
Findings
Outperforms baseline by 13.26 RA-WER
Achieves 17.53 higher BLEU score
Human evaluation shows improved readability
Abstract
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsLinear Layer · Linear Warmup With Linear Decay · Softmax · Adam · Multi-Head Attention · Residual Connection · Dropout · WordPiece · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
