Earnings-21: A Practical Benchmark for ASR in the Wild
Miguel Del Rio, Natalie Delworth, Ryan Westerman, Michelle Huang,, Nishchal Bhandari, Joseph Palakapilly, Quinten McNamara, Joshua Dong, Piotr, Zelasko, Miguel Jette

TL;DR
Earnings-21 is a new 39-hour benchmark dataset of earnings calls designed to evaluate and improve ASR systems' performance in real-world financial scenarios, especially for named entity recognition.
Contribution
The paper introduces Earnings-21, a comprehensive benchmark corpus for assessing ASR systems in the wild with a focus on financial speech and entity recognition, filling a gap in existing datasets.
Findings
ASR models show poor accuracy on certain NER categories.
Earnings-21 enables detailed analysis of ASR performance in real-world financial audio.
Benchmarking reveals significant room for improvement in entity recognition accuracy.
Abstract
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special attention towards named entity recognition. We benchmark four commercial ASR models, two internal models built with open-source tools, and an open-source LibriSpeech model and discuss their differences in performance on Earnings-21. Using our recently released fstalign tool, we provide a candid analysis of each model's recognition capabilities under different partitions. Our analysis finds that ASR accuracy for certain NER categories is poor, presenting a significant impediment to transcript…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
