FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

Moreno La Quatra; Valerio Mario Salerno; Yu Tsao; Sabato Marco; Siniscalchi

arXiv:2501.12979·cs.CL·January 23, 2025

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

Moreno La Quatra, Valerio Mario Salerno, Yu Tsao, Sabato Marco, Siniscalchi

PDF

1 Repo 7 Models

TL;DR

This paper introduces FlanEC, a Flan-T5 based encoder-decoder model designed for post-ASR error correction, demonstrating how scaling training data and dataset diversity improve transcription accuracy and grammaticality.

Contribution

The paper presents a novel application of Flan-T5 for post-ASR error correction, exploring data scaling and dataset diversity to enhance performance.

Findings

01

Scaling training data improves correction accuracy.

02

Incorporating diverse datasets enhances grammaticality.

03

FlanEC effectively maps n-best hypotheses into accurate transcriptions.

Abstract

In this paper, we present an encoder-decoder model leveraging Flan-T5 for post-Automatic Speech Recognition (ASR) Generative Speech Error Correction (GenSEC), and we refer to it as FlanEC. We explore its application within the GenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a single output sentence. By utilizing n-best lists from ASR models, we aim to improve the linguistic correctness, accuracy, and grammaticality of final ASR transcriptions. Specifically, we investigate whether scaling the training data and incorporating diverse datasets can lead to significant improvements in post-ASR error correction. We evaluate FlanEC using the HyPoradise dataset, providing a comprehensive analysis of the model's effectiveness in this domain. Furthermore, we assess the proposed approach under different settings to evaluate model scalability and efficiency, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

morenolaquatra/flanec
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFlan-T5