variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest
Siru Wang, Guoqi Qian

TL;DR
This paper presents a novel integrated approach combining random forest, weighted ridge regression with EM algorithm, and hypothesis testing to effectively perform variable selection and missing data imputation in large-scale, incomplete genomic GWAS data.
Contribution
It introduces a coherent method for phenotype-genotype association analysis that handles non-ignorable missing data in categorical GWAS, improving accuracy and bias reduction.
Findings
Validated with simulated GWAS data showing improved variable selection accuracy.
Successfully applied to breast cancer GWAS data revealing significant associations.
Demonstrated robustness of the method in handling non-ignorable missingness.
Abstract
Genomic data arising from a genome-wide association study (GWAS) are often not only of large-scale, but also incomplete. A specific form of their incompleteness is missing values with non-ignorable missingness mechanism. The intrinsic complications of genomic data present significant challenges in developing an unbiased and informative procedure of phenotype-genotype association analysis by a statistical variable selection approach. In this paper we develop a coherent procedure of categorical phenotype-genotype association analysis, in the presence of missing values with non-ignorable missingness mechanism in GWAS data, by integrating the state-of-the-art methods of random forest for variable selection, weighted ridge regression with EM algorithm for missing data imputation, and linear statistical hypothesis testing for determining the missingness mechanism. Two simulated GWAS are used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genetic Associations and Epidemiology · Gene expression and cancer classification
