TL;DR
This paper introduces a novel LLM-based generative error correction method that enhances rare word correction in speech recognition by using synthetic data and phonetic context, leading to improved accuracy.
Contribution
It presents a new approach combining synthetic data generation and phonetic cues to improve rare word correction in LLM-based GER for ASR.
Findings
Improves rare word correction accuracy.
Reduces WER and CER in English and Japanese datasets.
Mitigates over-correction by integrating phonetic context.
Abstract
Generative error correction (GER) with large language models (LLMs) has emerged as an effective post-processing approach to improve automatic speech recognition (ASR) performance. However, it often struggles with rare or domain-specific words due to limited training data. Furthermore, existing LLM-based GER approaches primarily rely on textual information, neglecting phonetic cues, which leads to over-correction. To address these issues, we propose a novel LLM-based GER approach that targets rare words and incorporates phonetic information. First, we generate synthetic data to contain rare words for fine-tuning the GER model. Second, we integrate ASR's N-best hypotheses along with phonetic context to mitigate over-correction. Experimental results show that our method not only improves the correction of rare words but also reduces the WER and CER across both English and Japanese datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729 · Graph Convolutional Network · Gait Emotion Recognition
