Detecting Chronic Kidney Disease(CKD) at the Initial Stage: A Novel Hybrid Feature-selection Method and Robust Data Preparation Pipeline for Different ML Techniques
Md. Taufiqul Haque Khan Tusar, Md. Touhidul Islam, Foyjul Islam Raju

TL;DR
This paper introduces a comprehensive data preparation pipeline and a hybrid feature selection method to improve early detection of CKD using various machine learning models, achieving perfect accuracy with Random Forest.
Contribution
It presents a novel hybrid feature selection technique and a structured data pipeline tailored for medical data, enhancing early CKD detection with high accuracy.
Findings
Random Forest achieved 100% accuracy in CKD detection.
The proposed pipeline effectively handles missing data, outliers, and data imbalance.
Hybrid feature selection reduces redundant features and improves model performance.
Abstract
Chronic Kidney Disease (CKD) has infected almost 800 million people around the world. Around 1.7 million people die each year because of it. Detecting CKD in the initial stage is essential for saving millions of lives. Many researchers have applied distinct Machine Learning (ML) methods to detect CKD at an early stage, but detailed studies are still missing. We present a structured and thorough method for dealing with the complexities of medical data with optimal performance. Besides, this study will assist researchers in producing clear ideas on the medical data preparation pipeline. In this paper, we applied KNN Imputation to impute missing values, Local Outlier Factor to remove outliers, SMOTE to handle data imbalance, K-stratified K-fold Cross-validation to validate the ML models, and a novel hybrid feature selection method to remove redundant features. Applied algorithms in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare
MethodsFeature Selection · Logistic Regression · Synthetic Minority Over-sampling Technique.
