New Arabic Medical Dataset for Diseases Classification
Jaafar Hammoud, Aleksandra Vatian, Natalia Dobrenko, Nikolai, Vedernikov, Anatoly Shalyto, Natalia Gusarova

TL;DR
This paper introduces a new Arabic medical dataset with 2,000 documents across 10 disease categories, addressing the lack of specialized Arabic medical datasets for deep learning.
Contribution
It provides a novel, labeled Arabic medical dataset and evaluates the performance of three pre-trained models on disease classification tasks.
Findings
Arabert outperformed BERT and AraBioNER in classification accuracy.
The dataset enables better training of Arabic medical NLP models.
Fine-tuning pre-trained models improves disease classification in Arabic texts.
Abstract
The Arabic language suffers from a great shortage of datasets suitable for training deep learning models, and the existing ones include general non-specialized classifications. In this work, we introduce a new Arab medical dataset, which includes two thousand medical documents collected from several Arabic medical websites, in addition to the Arab Medical Encyclopedia. The dataset was built for the task of classifying texts and includes 10 classes (Blood, Bone, Cardiovascular, Ear, Endocrine, Eye, Gastrointestinal, Immune, Liver and Nephrological) diseases. Experiments on the dataset were performed by fine-tuning three pre-trained models: BERT from Google, Arabert that based on BERT with large Arabic corpus, and AraBioNER that based on Arabert with Arabic medical corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Layer Normalization · Weight Decay · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay
