Feature set optimization by clustering, univariate association, Deep & Machine learning omics Wide Association Study (DMWAS) for Biomarkers discovery as tested on GTEx pilot dataset for death due to heart attack
Abhishek Narain Singh

TL;DR
This study introduces a clustering-based encoding scheme for structural genomic variations and applies deep and machine learning methods to identify biomarkers for death due to heart attack using GTEx data.
Contribution
It presents a novel clustering approach for encoding structural variations and demonstrates its effectiveness in genomic association studies with deep learning.
Findings
High accuracy in predicting heart attack death phenotype using DMWAS methods
Logistic regression outperformed other models in this context
Identified top genomic variants most associated with the phenotype
Abstract
Univariate and multivariate methods for association of the genom-ic variations with the end-or-endo phenotype have been widely used for genome wide association studies. In addition to encoding the SNPs, we advocate usage of clustering as a novel method to encode the structural variations, SVs, in genomes, such as the deletions and insertions polymorphism (DIPs), Copy Number Variations (CNVs), translocation, inversion, etc., that can be used as an independent fea-ture variable value for downstream computation by artificial intelli-gence methods to predict the endo-or-end phenotype. We introduce a clustering based encoding scheme for structural variations and om-ics based analysis. We conducted a complete all genomic variants association with the phenotype using deep learning and other ma-chine learning techniques, though other methods such as genetic al-gorithm can also be applied.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
MethodsLogistic Regression
