Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease
Hubert Haoyang Duan

TL;DR
This paper explores predicting coronary artery disease using genetic SNP data, comparing supervised learning algorithms and introducing a new feature selection method, demonstrating superior results with Random Forest and MTD Feature Selection.
Contribution
It introduces the Mass Transportation Distance (MTD) feature selection method and compares its effectiveness with Random Projections in predicting coronary artery disease.
Findings
MTD Feature Selection with Random Forest achieves 66.6% accuracy.
Area under ROC curve reaches 0.8562, surpassing previous results.
Random Forest outperforms k-NN and Random Projections in this context.
Abstract
From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on genetic variations at the DNA base pair level, called Single-Nucleotide Polymorphisms (SNPs), collected from the Ontario Heart Genomics Study (OHGS). First, the thesis explains two commonly used supervised learning algorithms, the k-Nearest Neighbour (k-NN) and Random Forest classifiers, and includes a complete proof that the k-NN classifier is universally consistent in any finite dimensional normed vector space. Second, the thesis introduces two dimensionality reduction steps, Random Projections, a known feature extraction technique based on the Johnson-Lindenstrauss lemma, and a new method termed Mass Transportation Distance (MTD) Feature Selection for discrete domains. Then, this thesis compares the performance of Random Projections with the k-NN classifier against MTD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare
Methodsk-Nearest Neighbors
