ASR4REAL: An extended benchmark for speech models

Morgane Riviere; Jade Copet; Gabriel Synnaeve

arXiv:2110.08583·eess.AS·October 19, 2021·1 cites

ASR4REAL: An extended benchmark for speech models

Morgane Riviere, Jade Copet, Gabriel Synnaeve

PDF

Open Access

TL;DR

This paper introduces ASR4REAL, a new benchmark for speech recognition that evaluates models across diverse real-world conditions, revealing biases and weaknesses not captured by existing datasets.

Contribution

It presents a comprehensive benchmark dataset that assesses speech models on accent, socio-economic, and conversational speech variations, highlighting areas needing improvement.

Findings

01

Models show bias based on accent and socio-economic status.

02

Performance drops significantly on conversational speech.

03

Large language models do not improve conversational speech recognition.

Abstract

Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a language model trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational language models

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques