Quranic Audio Dataset: Crowdsourced and Labeled Recitation from   Non-Arabic Speakers

Raghad Salameh; Mohamad Al Mdfaa; Nursultan Askarbekuly; Manuel; Mazzara

arXiv:2405.02675·cs.SD·May 7, 2024

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Raghad Salameh, Mohamad Al Mdfaa, Nursultan Askarbekuly, Manuel, Mazzara

PDF

Open Access 1 Datasets

TL;DR

This paper presents a crowdsourced, annotated Quranic audio dataset from non-Arabic speakers, enabling AI-based recitation learning tools, with detailed collection, annotation, and accuracy metrics.

Contribution

It introduces a novel crowdsourcing platform and dataset for Quranic recitations from non-Arabic speakers, facilitating AI development for recitation learning.

Findings

01

Collected 7000 recitations from 1287 participants across 11 countries

02

Achieved a crowd accuracy of 0.77 and inter-rater agreement of 0.63

03

Labeling accuracy with algorithm and expert comparison is 0.89

Abstract

This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

RetaSy/quranic_audio_dataset
dataset· 137 dl
137 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing