Medical Dataset Classification for Kurdish Short Text over Social Media
Ari M. Saeed, Shnya R. Hussein, Chro M. Ali, Tarik A. Rashid

TL;DR
This paper presents a new Kurdish medical dataset from social media comments and applies text classification techniques to distinguish medical comments from non-medical ones, aiding health-related social media analysis.
Contribution
It introduces a novel Kurdish medical dataset from Facebook comments and details a preprocessing and labeling process for classifying medical versus non-medical comments.
Findings
Dataset contains 6756 comments with 45% medical and 55% non-medical.
Six preprocessing steps improve data quality for classification.
The dataset supports future research in medical text analysis for Kurdish social media.
Abstract
The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
