AHD: Arabic healthcare dataset
Nashwan Ahmed Al-Majmar, Hezam Gawbah, Akram Alsubari

TL;DR
This paper introduces a large Arabic healthcare dataset to support NLP research in Arabic, especially for text classification and generation.
Contribution
The paper presents AHD, a large Arabic healthcare dataset with 808k questions and answers across 90 categories.
Findings
AHD contains over 808k questions and answers across 90 categories.
The dataset is scraped from the Altibbi medical website.
AHD is publicly available for research purposes.
Abstract
With the soaring demand for healthcare systems, chatbots are gaining tremendous popularity and research attention. Numerous language-centric research on healthcare is conducted day by day. Despite significant advances in Arabic Natural Language Processing (NLP), challenges remain in natural language classification and generation due to the lack of suitable datasets. The primary shortcoming of these models is the lack of suitable Arabic datasets for training. To address this, authors introduce a large Arabic Healthcare Dataset (AHD) of textual data. The dataset consists of over 808k questions and answers across 90 categories, offered to the research community for Arabic computational linguistics. Authors anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Arabic textual data, especially for text classification and generation purposes. Authors present…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
