Comparative Performance of Machine Learning Algorithms for Early Genetic Disorder and Subclass Classification
Abu Bakar Siddik, Faisal R. Badal, Afroza Islam

TL;DR
This study evaluates machine learning algorithms on clinical data to classify genetic disorders early in life, achieving up to 80% accuracy, and highlights the potential for timely diagnosis and intervention.
Contribution
It introduces a machine learning approach using basic clinical indicators for early genetic disorder classification, with optimized models for disorder types and subtypes.
Findings
CatBoost achieved 77% accuracy for disorder classes
SVM attained 80% accuracy for disorder subtypes
Models demonstrate feasibility for early diagnosis using simple clinical data
Abstract
A great deal of effort has been devoted to discovering a particular genetic disorder, but its classification across a broad spectrum of disorder classes and types remains elusive. Early diagnosis of genetic disorders enables timely interventions and improves outcomes. This study implements machine learning models using basic clinical indicators measurable at birth or infancy to enable diagnosis in preliminary life stages. Supervised learning algorithms were implemented on a dataset of 22083 instances with 42 features like family history, newborn metrics, and basic lab tests. Extensive hyperparameter tuning, feature engineering, and selection were undertaken. Two multi-class classifiers were developed: one for predicting disorder classes (mitochondrial, multifactorial, and single-gene) and one for subtypes (9 disorders). Performance was evaluated using accuracy, precision, recall, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare
MethodsSupport Vector Machine
