CrowdSpeech and VoxDIY: Benchmark Datasets for Crowdsourced Audio   Transcription

Nikita Pavlichenko; Ivan Stelmakh; Dmitry Ustalov

arXiv:2107.01091·cs.SD·October 22, 2021·1 cites

CrowdSpeech and VoxDIY: Benchmark Datasets for Crowdsourced Audio Transcription

Nikita Pavlichenko, Ivan Stelmakh, Dmitry Ustalov

PDF

Open Access 1 Repo 1 Models 2 Datasets

TL;DR

This paper introduces CrowdSpeech and VoxDIY, large-scale datasets for crowdsourced audio transcription, and proposes a principled methodology for reliable data collection and aggregation in speech recognition tasks.

Contribution

It provides the first large-scale crowdsourced audio transcription datasets and a general pipeline for data collection applicable to new domains and languages.

Findings

01

Existing aggregation methods have room for improvement.

02

The proposed pipeline is effective for under-resourced languages.

03

Open-source code enables replication and further research.

Abstract

Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. In simple problems such as image classification, crowdsourcing has become one of the standard tools for cheap and time-efficient data collection: thanks in large part to advances in research on aggregation methods. However, the applicability of crowdsourcing to more complex tasks (e.g., speech recognition) remains limited due to the lack of principled aggregation methods for these modalities. The main obstacle towards designing aggregation methods for more advanced applications is the absence of training data, and in this work, we focus on bridging this gap in speech recognition. For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions. Evaluation of existing and novel aggregation methods on our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Toloka/CrowdSpeech
noneOfficial

Models

🤗
toloka/t5-large-for-text-aggregation
model· 6 dl· ♡ 7
6 dl♡ 7

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Mobile Crowdsensing and Crowdsourcing · Speech and Audio Processing