Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Tatiana Likhomanenko; Qiantong Xu; Vineel Pratap; Paden Tomasello,; Jacob Kahn; Gilad Avidov; Ronan Collobert; Gabriel Synnaeve

arXiv:2010.11745·cs.LG·May 4, 2021

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello,, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve

PDF

1 Repo

TL;DR

This paper questions the effectiveness of current ASR evaluation practices, showing that models trained on diverse datasets generalize better across domains and that averaging performance over multiple benchmarks is a reliable indicator of real-world robustness.

Contribution

The study demonstrates the importance of diverse training data and multiple benchmarks for improving and assessing ASR model robustness and generalization.

Findings

01

Reverberation and noise augmentation enhance cross-domain performance.

02

Average WER over multiple benchmarks correlates with real-world robustness.

03

Combined training on multiple datasets yields competitive results.

Abstract

Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets - in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. We show that, in general, reverberative and additive noise augmentation improves generalization performance across domains. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world noisy data. Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/wav2letter/tree/master/recipes/rasr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.