Robust ASR Error Correction with Conservative Data Filtering
Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

TL;DR
This paper introduces a conservative data filtering method for error correction in ASR systems, improving robustness and reducing overcorrection by selecting high-quality training pairs based on linguistic acceptability and contextual inferability.
Contribution
It proposes fundamental criteria for filtering training data in ASR error correction, enhancing model robustness and performance in out-of-domain scenarios.
Findings
Reduces overcorrection in ASR error correction models.
Improves accuracy and quality of ASR outputs in OOD settings.
Demonstrates effectiveness on Japanese ASR benchmarks.
Abstract
Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsSparse Evolutionary Training · Focus
