Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing
Hao Zhang, Felix Stahlberg, Shankar Kumar

TL;DR
This paper introduces a phrase-based edit representation for LLMs in ASR post editing, achieving a better efficiency-accuracy balance by reducing output length and computational cost while maintaining high correction quality.
Contribution
It proposes a novel phrase-based edit representation inspired by statistical machine translation, outperforming span-based methods in efficiency-accuracy trade-offs for ASR post editing.
Findings
Achieves 50-60% WER gap reduction on LibriSpeech
Reduces output length by 10-20% compared to span models
Maintains high accuracy with improved efficiency
Abstract
Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki (2023) proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
