Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova

TL;DR
This paper introduces a large-scale synthetic dataset for improving the customization of automatic speech recognition systems, focusing on rare and out-of-vocabulary phrases, and demonstrates its effectiveness in reducing errors.
Contribution
It provides the first large-scale synthetic dataset for ASR customization, including methods for generating realistic corrupted hypotheses and hard negative biasing phrases.
Findings
Hard negative biasing phrases reduce WER
Dataset enables realistic simulation of rare phrases
Improves customization model performance
Abstract
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsFocus
