ASR Benchmarking: Need for a More Representative Conversational Dataset

Gaurav Maheshwari; Dmitry Ivanov; Th\'eo Johannet; Kevin El Haddad

arXiv:2409.12042·cs.CL·September 19, 2024

ASR Benchmarking: Need for a More Representative Conversational Dataset

Gaurav Maheshwari, Dmitry Ivanov, Th\'eo Johannet, Kevin El Haddad

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper highlights the inadequacy of current ASR benchmarks for real-world conversations and introduces a new multilingual dataset to better evaluate ASR performance in realistic, disfluent speech scenarios.

Contribution

The study presents a new multilingual conversational dataset from TalkBank, emphasizing the need for more representative benchmarks for ASR systems.

Findings

01

Significant performance drops of ASR models in conversational settings

02

Correlation between disfluencies and increased Word Error Rate

03

Current benchmarks do not reflect real-world conversational complexities

Abstract

Automatic Speech Recognition (ASR) systems have achieved remarkable performance on widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do not adequately reflect the complexities of real-world conversational environments, where speech is often unstructured and contains disfluencies such as pauses, interruptions, and diverse accents. In this study, we introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults. Our results show a significant performance drop across various state-of-the-art ASR models when tested in conversational settings. Furthermore, we observe a correlation between Word Error Rate and the presence of speech disfluencies, highlighting the critical need for more realistic, conversational ASR benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

diabolocom-research/conversationaldataset
noneOfficial

Datasets

diabolocom/talkbank_4_stt
dataset· 281 dl
281 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems