# AHD: Arabic healthcare dataset

**Authors:** Nashwan Ahmed Al-Majmar, Hezam Gawbah, Akram Alsubari

PMC · DOI: 10.1016/j.dib.2024.110855 · 2024-08-22

## TL;DR

This paper introduces a large Arabic healthcare dataset to support NLP research in Arabic, especially for text classification and generation.

## Contribution

The paper presents AHD, a large Arabic healthcare dataset with 808k questions and answers across 90 categories.

## Key findings

- AHD contains over 808k questions and answers across 90 categories.
- The dataset is scraped from the Altibbi medical website.
- AHD is publicly available for research purposes.

## Abstract

With the soaring demand for healthcare systems, chatbots are gaining tremendous popularity and research attention. Numerous language-centric research on healthcare is conducted day by day. Despite significant advances in Arabic Natural Language Processing (NLP), challenges remain in natural language classification and generation due to the lack of suitable datasets. The primary shortcoming of these models is the lack of suitable Arabic datasets for training. To address this, authors introduce a large Arabic Healthcare Dataset (AHD) of textual data. The dataset consists of over 808k questions and answers across 90 categories, offered to the research community for Arabic computational linguistics. Authors anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Arabic textual data, especially for text classification and generation purposes. Authors present the data in raw form. AHD is composed of main dataset scraped from medical website, which is Altibbi website. AHD is made public and freely available at http://data.mendeley.com/datasets/mgj29ndgrk/5.

## Full-text entities

- **Diseases:** AHD (MESH:D003428)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11403399/full.md

---
Source: https://tomesphere.com/paper/PMC11403399