AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Eugen Beck; Sarah Beranek; Uma Moothiringote; Daniel Mann; Wilfried Michel; Katie Nguyen; Taylor Tragemann

arXiv:2604.27543·cs.CL·May 1, 2026

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Eugen Beck, Sarah Beranek, Uma Moothiringote, Daniel Mann, Wilfried Michel, Katie Nguyen, Taylor Tragemann

PDF

2 Datasets

TL;DR

This paper introduces the AppTek Call-Center Dialogues corpus, a diverse, spontaneous English speech dataset with multiple accents, designed for evaluating ASR robustness in conversational AI.

Contribution

The work provides a new, comprehensive benchmark dataset for English ASR across multiple accents and scenarios, addressing limitations of existing corpora.

Findings

01

ASR performance varies significantly across accents.

02

Segmentation approaches impact ASR accuracy.

03

General American English benchmarks do not ensure robustness for other accents.

Abstract

Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech, or lack explicit dialect annotations to evaluate robustness for a diverse user base. This work presents the AppTek Call-Center Dialogues corpus, a collection of spontaneous, role-played agent-customer conversations spanning fourteen English accents covering sixteen service-oriented scenarios. The dataset was commissioned specifically for evaluation and none of the audio or text was publicly available prior to release, reducing the risk of overlap with existing large-scale pretraining corpora. We benchmark a set of open-source ASR systems under different segmentation approaches. Results show substantial variation across accents and segmentation methods, indicating that good performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.