Statistical methods of SNP data analysis with applications
Alexander Bulinski (LPMA), Oleg Butkovsky, Alexey Shashkin, Pavel, Yaskov (MIRAS)

TL;DR
This paper reviews and develops statistical methods like MDR, logic regression, and ensemble techniques for analyzing SNP data, focusing on complex disease risk assessment using advanced computational resources.
Contribution
It introduces new modifications to existing methods, such as the MDR with 'independent rule', for improved genetic data analysis.
Findings
Identified significant SNP combinations related to cardiovascular diseases
Demonstrated the effectiveness of advanced statistical methods on large datasets
Utilized supercomputing resources for complex genetic data analysis
Abstract
Various statistical methods important for genetic analysis are considered and developed. Namely, we concentrate on the multifactor dimensionality reduction, logic regression, random forests and stochastic gradient boosting. These methods and their new modifications, e.g., the MDR method with "independent rule", are used to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and external risk factors are examined. To perform the data analysis concerning the ischemic heart disease and myocardial infarction the supercomputer SKIF "Chebyshev" of the Lomonosov Moscow State University was employed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Genetic Associations and Epidemiology
