MURAD: A Large-Scale Multi-Domain Unified Reverse Arabic Dictionary Dataset
Serry Sibaee, Yasser Alhabashi, Nadia Sibai, Yara Farouk, Adel Ammar, Sawsan AlHalawani, Wadii Boulila

TL;DR
MURAD is a comprehensive, multi-domain Arabic lexical dataset with over 96,000 word-definition pairs, designed to support NLP and lexicographic research through accurate, domain-annotated data.
Contribution
This paper introduces MURAD, the first large-scale, multi-domain reverse Arabic dictionary dataset with verified entries from diverse sources, enhancing Arabic NLP resources.
Findings
Dataset contains 96,243 word-definition pairs.
Includes terms from multiple scientific and literary domains.
Supports applications like reverse dictionary modeling and semantic retrieval.
Abstract
Arabic is a linguistically and culturally rich language with a vast vocabulary that spans scientific, religious, and literary domains. Yet, large-scale lexical datasets linking Arabic words to precise definitions remain limited. We present MURAD (Multi-domain Unified Reverse Arabic Dictionary), an open lexical dataset with 96,243 word-definition pairs. The data come from trusted reference works and educational sources. Extraction used a hybrid pipeline integrating direct text parsing, optical character recognition, and automated reconstruction. This ensures accuracy and clarity. Each record aligns a target word with its standardized Arabic definition and metadata that identifies the source domain. The dataset covers terms from linguistics, Islamic studies, mathematics, physics, psychology, and engineering. It supports computational linguistics and lexicographic research. Applications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
