Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR   Customization

Alexandra Antonova

arXiv:2309.17267·eess.AS·October 2, 2023

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

Alexandra Antonova

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a large-scale synthetic dataset for improving the customization of automatic speech recognition systems, focusing on rare and out-of-vocabulary phrases, and demonstrates its effectiveness in reducing errors.

Contribution

It provides the first large-scale synthetic dataset for ASR customization, including methods for generating realistic corrupted hypotheses and hard negative biasing phrases.

Findings

01

Hard negative biasing phrases reduce WER

02

Dataset enables realistic simulation of rare phrases

03

Improves customization model performance

Abstract

We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

bene-ges/wiki-en-asr-adapt
dataset· 49 dl
49 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling

MethodsFocus