Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran
Muhammad Umar Salman, Mohammad Areeb Qazi, Mohammed Talha Alam

TL;DR
Quran-MD is a detailed multilingual multimodal dataset combining text, audio, and linguistic data of the Quran at verse and word levels, supporting advanced NLP and speech applications.
Contribution
It introduces a comprehensive, fine-grained dataset with diverse recitation styles, enabling new research in Quranic recitation, linguistic analysis, and multimodal AI applications.
Findings
Supports tasks like ASR, TTS, and tajweed detection
Provides diverse recitation audio from 32 reciters
Facilitates multimodal embeddings and semantic retrieval
Abstract
We present Quran MD, a comprehensive multimodal dataset of the Quran that integrates textual, linguistic, and audio dimensions at the verse and word levels. For each verse (ayah), the dataset provides its original Arabic text, English translation, and phonetic transliteration. To capture the rich oral tradition of Quranic recitation, we include verse-level audio from 32 distinct reciters, reflecting diverse recitation styles and dialectical nuances. At the word level, each token is paired with its corresponding Arabic script, English translation, transliteration, and an aligned audio recording, allowing fine-grained analysis of pronunciation, phonology, and semantic context. This dataset supports various applications, including natural language processing, speech recognition, text-to-speech synthesis, linguistic analysis, and digital Islamic studies. Bridging text and audio modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Text and Document Classification Technologies
