Predicting Compact Phrasal Rewrites with Large Language Models for ASR   Post Editing

Hao Zhang; Felix Stahlberg; Shankar Kumar

arXiv:2501.13831·cs.CL·January 24, 2025

Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing

Hao Zhang, Felix Stahlberg, Shankar Kumar

PDF

Open Access

TL;DR

This paper introduces a phrase-based edit representation for LLMs in ASR post editing, achieving a better efficiency-accuracy balance by reducing output length and computational cost while maintaining high correction quality.

Contribution

It proposes a novel phrase-based edit representation inspired by statistical machine translation, outperforming span-based methods in efficiency-accuracy trade-offs for ASR post editing.

Findings

01

Achieves 50-60% WER gap reduction on LibriSpeech

02

Reduces output length by 10-20% compared to span models

03

Maintains high accuracy with improved efficiency

Abstract

Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki (2023) proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis