Tadabur: A Large-Scale Quran Audio Dataset

Faisal Alherran

arXiv:2604.18932·cs.SD·April 22, 2026

Tadabur: A Large-Scale Quran Audio Dataset

Faisal Alherran

PDF

1 Repo 1 Models 1 Datasets

TL;DR

Tadabur is a comprehensive large-scale Quran audio dataset with over 1400 hours from 600+ reciters, designed to advance Quranic speech research by providing diverse and extensive audio resources.

Contribution

It introduces a significantly larger and more diverse Quran audio dataset than existing resources, supporting standardized benchmarks and research in Quranic speech analysis.

Findings

01

Contains over 1400 hours of recitation audio from 600+ reciters

02

Provides substantial variation in recitation styles and recording conditions

03

Aims to facilitate future Quranic speech research and benchmarking

Abstract

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fherran/tadabur
github

Models

🤗
FaisaI/tadabur-Whisper-Small
model· 338 dl· ♡ 14
338 dl♡ 14

Datasets

FaisaI/tadabur
dataset· 8.8k dl
8.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.