Framework for Curating Speech Datasets and Evaluating ASR Systems: A   Case Study for Polish

Micha{\l} Junczyk

arXiv:2408.00005·eess.AS·August 2, 2024·1 cites

Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish

Micha{\l} Junczyk

PDF

Open Access 1 Repo

TL;DR

This paper presents a comprehensive framework for curating speech datasets and evaluating ASR systems, demonstrated through a detailed Polish language case study involving extensive dataset curation and system evaluation.

Contribution

It introduces a scalable framework for dataset curation and evaluation, applied to Polish, enabling reproducible and extensive comparison of ASR systems with open tools and datasets.

Findings

01

Curated over 24 datasets for Polish ASR.

02

Evaluated 25 system-model combinations.

03

Provided interactive dashboards and open datasets.

Abstract

Speech datasets available in the public domain are often underutilized because of challenges in discoverability and interoperability. A comprehensive framework has been designed to survey, catalog, and curate available speech datasets, which allows replicable evaluation of automatic speech recognition (ASR) systems. A case study focused on the Polish language was conducted; the framework was applied to curate more than 24 datasets and evaluate 25 combinations of ASR systems and models. This research constitutes the most extensive comparison to date of both commercial and free ASR systems for the Polish language. It draws insights from 600 system-model-test set evaluations, marking a significant advancement in both scale and comprehensiveness. The results of surveys and performance comparisons are available as interactive dashboards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

goodmike31/pl-asr-bigos-tools
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems

MethodsSparse Evolutionary Training